The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(15)816_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (15, 1 - acm diverg, 816), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)816_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 723
Number of entity ID pairs that occurred more than once: 0
Analyse set of 723 weight vectors
Containing 170 true matches and 553 true non-matches
(23.51% true matches)
Identified 686 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 655 (95.48%)
2 : 28 (4.08%)
3 : 2 (0.29%)
6 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 686 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 153
0.000 : 533
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 723
Number of unique weight vectors: 686
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(686, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 686 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 686 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 26 matches and 58 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.893
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 602 weight vectors
Based on 26 matches and 58 non-matches
Classified 112 matches and 490 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
(490, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
Current size of match and non-match training data sets: 26 / 58
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 490 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 490 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.714, 0.727, 0.750, 0.294, 0.833] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 10 matches and 60 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 10
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(15)820_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 820), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)820_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 946
Number of entity ID pairs that occurred more than once: 0
Analyse set of 946 weight vectors
Containing 219 true matches and 727 true non-matches
(23.15% true matches)
Identified 891 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 855 (95.96%)
2 : 33 (3.70%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 891 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 706
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 945
Number of unique weight vectors: 891
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(891, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 891 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 891 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 805 weight vectors
Based on 24 matches and 62 non-matches
Classified 130 matches and 675 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(130, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(675, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 130 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 130 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)879_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 879), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)879_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 445
Number of entity ID pairs that occurred more than once: 0
Analyse set of 445 weight vectors
Containing 196 true matches and 249 true non-matches
(44.04% true matches)
Identified 421 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 404 (95.96%)
2 : 14 (3.33%)
3 : 2 (0.48%)
7 : 1 (0.24%)
Identified 0 non-pure unique weight vectors (from 421 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 247
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 445
Number of unique weight vectors: 421
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(421, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 421 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 421 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 37 matches and 41 non-matches
Purity of oracle classification: 0.526
Entropy of oracle classification: 0.998
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 343 weight vectors
Based on 37 matches and 41 non-matches
Classified 278 matches and 65 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(278, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
(65, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
Current size of match and non-match training data sets: 37 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 278 weight vectors
- Estimated match proportion 0.474
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 278 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.933, 1.000, 1.000, 1.000] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 41 matches and 30 non-matches
Purity of oracle classification: 0.577
Entropy of oracle classification: 0.983
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 30
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)660_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 660), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)660_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 748
Number of entity ID pairs that occurred more than once: 0
Analyse set of 748 weight vectors
Containing 196 true matches and 552 true non-matches
(26.20% true matches)
Identified 706 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 671 (95.04%)
2 : 32 (4.53%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 706 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 532
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 748
Number of unique weight vectors: 706
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(706, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 706 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 622 weight vectors
Based on 31 matches and 53 non-matches
Classified 284 matches and 338 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(284, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(338, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 338 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 338 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.367, 0.667, 0.583, 0.625, 0.316] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.333, 0.214, 0.333, 0.588, 0.476] (False)
[1.000, 0.000, 0.438, 0.500, 0.467, 0.529, 0.611] (False)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[1.000, 0.000, 0.818, 0.727, 0.438, 0.375, 0.400] (False)
[0.857, 0.000, 0.500, 0.389, 0.235, 0.045, 0.526] (False)
[1.000, 0.000, 0.476, 0.179, 0.500, 0.412, 0.357] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.833, 0.571, 0.727, 0.647, 0.857] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.583, 0.875, 0.727, 0.833, 0.643] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)818_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.980198
recall 0.331104
f-measure 0.495
da 101
dm 0
ndm 0
tp 99
fp 2
tn 4.76529e+07
fn 200
Name: (10, 1 - acm diverg, 818), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)818_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 265
Number of entity ID pairs that occurred more than once: 0
Analyse set of 265 weight vectors
Containing 152 true matches and 113 true non-matches
(57.36% true matches)
Identified 250 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 239 (95.60%)
2 : 8 (3.20%)
3 : 2 (0.80%)
4 : 1 (0.40%)
Identified 0 non-pure unique weight vectors (from 250 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 139
0.000 : 111
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 265
Number of unique weight vectors: 250
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(250, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 250 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 69
Perform initial selection using "far" method
Farthest first selection of 69 weight vectors from 250 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 35 matches and 34 non-matches
Purity of oracle classification: 0.507
Entropy of oracle classification: 1.000
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 34
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 181 weight vectors
Based on 35 matches and 34 non-matches
Classified 115 matches and 66 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 69
Size, purity, entropy, and estimated match proportion of clusters in queue:
(115, 0.5072463768115942, 0.9998484829291058, 0.5072463768115942)
(66, 0.5072463768115942, 0.9998484829291058, 0.5072463768115942)
Current size of match and non-match training data sets: 35 / 34
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 115 weight vectors
- Estimated match proportion 0.507
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 115 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 42 matches and 11 non-matches
Purity of oracle classification: 0.792
Entropy of oracle classification: 0.737
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 11
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
101.0
Analisando o arquivo: diverg(15)729_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (15, 1 - acm diverg, 729), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)729_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1031
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1031 weight vectors
Containing 203 true matches and 828 true non-matches
(19.69% true matches)
Identified 981 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 947 (96.53%)
2 : 31 (3.16%)
3 : 2 (0.20%)
16 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 981 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 807
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1030
Number of unique weight vectors: 981
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(981, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 981 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 981 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 894 weight vectors
Based on 26 matches and 61 non-matches
Classified 101 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 101 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 45 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(10)318_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987952
recall 0.274247
f-measure 0.429319
da 83
dm 0
ndm 0
tp 82
fp 1
tn 4.76529e+07
fn 217
Name: (10, 1 - acm diverg, 318), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)318_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 504
Number of entity ID pairs that occurred more than once: 0
Analyse set of 504 weight vectors
Containing 147 true matches and 357 true non-matches
(29.17% true matches)
Identified 488 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 481 (98.57%)
2 : 4 (0.82%)
3 : 2 (0.41%)
9 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 488 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 131
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 356
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 495
Number of unique weight vectors: 487
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(487, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 487 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 487 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.714, 0.353, 0.583, 0.571] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 26 matches and 54 non-matches
Purity of oracle classification: 0.675
Entropy of oracle classification: 0.910
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 407 weight vectors
Based on 26 matches and 54 non-matches
Classified 110 matches and 297 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(110, 0.675, 0.9097361225311662, 0.325)
(297, 0.675, 0.9097361225311662, 0.325)
Current size of match and non-match training data sets: 26 / 54
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 110 weight vectors
- Estimated match proportion 0.325
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 110 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 42 matches and 6 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
83.0
Analisando o arquivo: diverg(10)22_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 22), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)22_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 596
Number of entity ID pairs that occurred more than once: 0
Analyse set of 596 weight vectors
Containing 196 true matches and 400 true non-matches
(32.89% true matches)
Identified 547 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 513 (93.78%)
2 : 31 (5.67%)
3 : 2 (0.37%)
15 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 547 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 379
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 595
Number of unique weight vectors: 547
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(547, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 547 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 547 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 31 matches and 50 non-matches
Purity of oracle classification: 0.617
Entropy of oracle classification: 0.960
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 466 weight vectors
Based on 31 matches and 50 non-matches
Classified 159 matches and 307 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
(307, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
Current size of match and non-match training data sets: 31 / 50
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 159 weight vectors
- Estimated match proportion 0.383
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 159 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 47 matches and 11 non-matches
Purity of oracle classification: 0.810
Entropy of oracle classification: 0.701
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 11
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(10)439_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 439), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)439_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 544
Number of entity ID pairs that occurred more than once: 0
Analyse set of 544 weight vectors
Containing 185 true matches and 359 true non-matches
(34.01% true matches)
Identified 511 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 494 (96.67%)
2 : 14 (2.74%)
3 : 2 (0.39%)
16 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 511 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 154
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 356
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 543
Number of unique weight vectors: 511
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(511, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 511 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 511 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 31 matches and 50 non-matches
Purity of oracle classification: 0.617
Entropy of oracle classification: 0.960
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 430 weight vectors
Based on 31 matches and 50 non-matches
Classified 127 matches and 303 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(127, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
(303, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
Current size of match and non-match training data sets: 31 / 50
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 303 weight vectors
- Estimated match proportion 0.383
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 303 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.778, 0.429, 0.571, 0.750, 0.600] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.750, 0.750, 0.688, 0.500, 0.800] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.625, 0.526, 0.300, 0.778, 0.609] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.615, 0.826, 0.286, 0.857, 0.643] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.611, 0.000, 0.800, 0.684, 0.500, 0.778, 0.609] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.714, 0.500, 0.500, 0.412, 0.571] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 0 matches and 70 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)951_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.982143
recall 0.183946
f-measure 0.309859
da 56
dm 0
ndm 0
tp 55
fp 1
tn 4.76529e+07
fn 244
Name: (10, 1 - acm diverg, 951), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)951_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 668
Number of entity ID pairs that occurred more than once: 0
Analyse set of 668 weight vectors
Containing 201 true matches and 467 true non-matches
(30.09% true matches)
Identified 617 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 583 (94.49%)
2 : 31 (5.02%)
3 : 2 (0.32%)
17 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 617 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 446
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 667
Number of unique weight vectors: 617
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(617, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 617 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 617 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 24 matches and 59 non-matches
Purity of oracle classification: 0.711
Entropy of oracle classification: 0.868
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 534 weight vectors
Based on 24 matches and 59 non-matches
Classified 98 matches and 436 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(98, 0.7108433734939759, 0.8676293117125106, 0.2891566265060241)
(436, 0.7108433734939759, 0.8676293117125106, 0.2891566265060241)
Current size of match and non-match training data sets: 24 / 59
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 98 weight vectors
- Estimated match proportion 0.289
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 98 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 44 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
56.0
Analisando o arquivo: diverg(10)537_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 537), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)537_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 663
Number of entity ID pairs that occurred more than once: 0
Analyse set of 663 weight vectors
Containing 194 true matches and 469 true non-matches
(29.26% true matches)
Identified 642 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 628 (97.82%)
2 : 11 (1.71%)
3 : 2 (0.31%)
7 : 1 (0.16%)
Identified 0 non-pure unique weight vectors (from 642 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.000 : 469
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 663
Number of unique weight vectors: 642
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(642, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 642 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 642 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 559 weight vectors
Based on 28 matches and 55 non-matches
Classified 136 matches and 423 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(423, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 423 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 423 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.783, 0.583, 0.435, 0.765, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.407, 0.643, 0.667, 0.500, 0.563] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 9 matches and 62 non-matches
Purity of oracle classification: 0.873
Entropy of oracle classification: 0.548
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)759_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 759), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)759_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 750
Number of entity ID pairs that occurred more than once: 0
Analyse set of 750 weight vectors
Containing 222 true matches and 528 true non-matches
(29.60% true matches)
Identified 714 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 695 (97.34%)
2 : 16 (2.24%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 714 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 525
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 749
Number of unique weight vectors: 714
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(714, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 714 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 714 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 630 weight vectors
Based on 28 matches and 56 non-matches
Classified 133 matches and 497 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(497, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 497 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 497 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 13 matches and 60 non-matches
Purity of oracle classification: 0.822
Entropy of oracle classification: 0.676
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)787_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 787), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)787_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 118 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 118 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)51_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 51), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)51_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 824
Number of entity ID pairs that occurred more than once: 0
Analyse set of 824 weight vectors
Containing 209 true matches and 615 true non-matches
(25.36% true matches)
Identified 777 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 742 (95.50%)
2 : 32 (4.12%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 777 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 594
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 823
Number of unique weight vectors: 777
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(777, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 777 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 777 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 692 weight vectors
Based on 29 matches and 56 non-matches
Classified 153 matches and 539 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(539, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 539 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 539 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.791, 1.000, 0.275, 0.269, 0.192, 0.084, 0.200] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 2 matches and 72 non-matches
Purity of oracle classification: 0.973
Entropy of oracle classification: 0.179
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)188_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (20, 1 - acm diverg, 188), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)188_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 920
Number of entity ID pairs that occurred more than once: 0
Analyse set of 920 weight vectors
Containing 215 true matches and 705 true non-matches
(23.37% true matches)
Identified 868 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 832 (95.85%)
2 : 33 (3.80%)
3 : 2 (0.23%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 868 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 684
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 919
Number of unique weight vectors: 868
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(868, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 868 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 868 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 782 weight vectors
Based on 28 matches and 58 non-matches
Classified 158 matches and 624 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(158, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(624, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 624 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 624 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 2 matches and 72 non-matches
Purity of oracle classification: 0.973
Entropy of oracle classification: 0.179
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)91_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 91), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)91_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)648_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 648), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)648_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 27 matches and 60 non-matches
Classified 148 matches and 784 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(784, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 784 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 784 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 8 matches and 66 non-matches
Purity of oracle classification: 0.892
Entropy of oracle classification: 0.494
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)257_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 257), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)257_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 298
Number of entity ID pairs that occurred more than once: 0
Analyse set of 298 weight vectors
Containing 189 true matches and 109 true non-matches
(63.42% true matches)
Identified 274 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 261 (95.26%)
2 : 10 (3.65%)
3 : 2 (0.73%)
11 : 1 (0.36%)
Identified 1 non-pure unique weight vectors (from 274 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 108
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 297
Number of unique weight vectors: 274
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(274, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 274 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 71
Perform initial selection using "far" method
Farthest first selection of 71 weight vectors from 274 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 36 matches and 35 non-matches
Purity of oracle classification: 0.507
Entropy of oracle classification: 1.000
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 35
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 203 weight vectors
Based on 36 matches and 35 non-matches
Classified 141 matches and 62 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 71
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.5070422535211268, 0.9998568991526107, 0.5070422535211268)
(62, 0.5070422535211268, 0.9998568991526107, 0.5070422535211268)
Current size of match and non-match training data sets: 36 / 35
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 62 weight vectors
- Estimated match proportion 0.507
Sample size for this cluster: 38
Farthest first selection of 38 weight vectors from 62 vectors
The selected farthest weight vectors are:
[0.530, 1.000, 0.159, 0.086, 0.182, 0.159, 0.163] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.800, 1.000, 0.242, 0.121, 0.200, 0.171, 0.000] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 38 weight vectors
The oracle will correctly classify 38 weight vectors and wrongly classify 0
Classified 0 matches and 38 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 38 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)880_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 880), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)880_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 810
Number of entity ID pairs that occurred more than once: 0
Analyse set of 810 weight vectors
Containing 223 true matches and 587 true non-matches
(27.53% true matches)
Identified 756 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 719 (95.11%)
2 : 34 (4.50%)
3 : 2 (0.26%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 756 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 566
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 809
Number of unique weight vectors: 756
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(756, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 756 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 756 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 671 weight vectors
Based on 27 matches and 58 non-matches
Classified 94 matches and 577 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(577, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 577 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 577 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 20 matches and 53 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 20
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)838_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 838), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)838_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 733
Number of entity ID pairs that occurred more than once: 0
Analyse set of 733 weight vectors
Containing 198 true matches and 535 true non-matches
(27.01% true matches)
Identified 691 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 656 (94.93%)
2 : 32 (4.63%)
3 : 2 (0.29%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 691 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 515
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 733
Number of unique weight vectors: 691
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(691, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 691 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 691 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 26 matches and 58 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.893
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 607 weight vectors
Based on 26 matches and 58 non-matches
Classified 136 matches and 471 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
(471, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
Current size of match and non-match training data sets: 26 / 58
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 136 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 136 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 49 matches and 2 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.239
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)526_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.980198
recall 0.331104
f-measure 0.495
da 101
dm 0
ndm 0
tp 99
fp 2
tn 4.76529e+07
fn 200
Name: (10, 1 - acm diverg, 526), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)526_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 248
Number of entity ID pairs that occurred more than once: 0
Analyse set of 248 weight vectors
Containing 147 true matches and 101 true non-matches
(59.27% true matches)
Identified 233 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 222 (95.28%)
2 : 8 (3.43%)
3 : 2 (0.86%)
4 : 1 (0.43%)
Identified 0 non-pure unique weight vectors (from 233 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 134
0.000 : 99
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 248
Number of unique weight vectors: 233
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(233, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 233 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 68
Perform initial selection using "far" method
Farthest first selection of 68 weight vectors from 233 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.344, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 35 matches and 33 non-matches
Purity of oracle classification: 0.515
Entropy of oracle classification: 0.999
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 33
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 165 weight vectors
Based on 35 matches and 33 non-matches
Classified 105 matches and 60 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 68
Size, purity, entropy, and estimated match proportion of clusters in queue:
(105, 0.5147058823529411, 0.9993759069576514, 0.5147058823529411)
(60, 0.5147058823529411, 0.9993759069576514, 0.5147058823529411)
Current size of match and non-match training data sets: 35 / 33
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 60 weight vectors
- Estimated match proportion 0.515
Sample size for this cluster: 37
Farthest first selection of 37 weight vectors from 60 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
Perform oracle with 100.00 accuracy on 37 weight vectors
The oracle will correctly classify 37 weight vectors and wrongly classify 0
Classified 0 matches and 37 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 37
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 37 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
101.0
Analisando o arquivo: diverg(10)931_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990099
recall 0.334448
f-measure 0.5
da 101
dm 0
ndm 0
tp 100
fp 1
tn 4.76529e+07
fn 199
Name: (10, 1 - acm diverg, 931), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)931_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 999
Number of entity ID pairs that occurred more than once: 0
Analyse set of 999 weight vectors
Containing 164 true matches and 835 true non-matches
(16.42% true matches)
Identified 960 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 931 (96.98%)
2 : 26 (2.71%)
3 : 2 (0.21%)
10 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 960 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 145
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 814
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 998
Number of unique weight vectors: 960
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(960, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 960 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 960 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 873 weight vectors
Based on 24 matches and 63 non-matches
Classified 84 matches and 789 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(84, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(789, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 789 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 789 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 14 matches and 56 non-matches
Purity of oracle classification: 0.800
Entropy of oracle classification: 0.722
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
101.0
Analisando o arquivo: diverg(15)997_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 997), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)997_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 665
Number of entity ID pairs that occurred more than once: 0
Analyse set of 665 weight vectors
Containing 217 true matches and 448 true non-matches
(32.63% true matches)
Identified 628 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 610 (97.13%)
2 : 15 (2.39%)
3 : 2 (0.32%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 628 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 445
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 664
Number of unique weight vectors: 628
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(628, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 628 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 628 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 27 matches and 56 non-matches
Purity of oracle classification: 0.675
Entropy of oracle classification: 0.910
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 545 weight vectors
Based on 27 matches and 56 non-matches
Classified 133 matches and 412 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
(412, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
Current size of match and non-match training data sets: 27 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 133 weight vectors
- Estimated match proportion 0.325
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 133 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)135_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (15, 1 - acm diverg, 135), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)135_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 377
Number of entity ID pairs that occurred more than once: 0
Analyse set of 377 weight vectors
Containing 206 true matches and 171 true non-matches
(54.64% true matches)
Identified 343 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 327 (95.34%)
2 : 13 (3.79%)
3 : 2 (0.58%)
18 : 1 (0.29%)
Identified 1 non-pure unique weight vectors (from 343 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 168
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 376
Number of unique weight vectors: 343
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(343, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 343 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 343 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 30 matches and 45 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 268 weight vectors
Based on 30 matches and 45 non-matches
Classified 145 matches and 123 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 75
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.6, 0.9709505944546686, 0.4)
(123, 0.6, 0.9709505944546686, 0.4)
Current size of match and non-match training data sets: 30 / 45
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 123 weight vectors
- Estimated match proportion 0.400
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 123 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 3 matches and 50 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(15)95_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 95), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)95_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 788
Number of entity ID pairs that occurred more than once: 0
Analyse set of 788 weight vectors
Containing 224 true matches and 564 true non-matches
(28.43% true matches)
Identified 749 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 730 (97.46%)
2 : 16 (2.14%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 749 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 561
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 787
Number of unique weight vectors: 749
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(749, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 749 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 749 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 34 matches and 51 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 664 weight vectors
Based on 34 matches and 51 non-matches
Classified 153 matches and 511 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6, 0.9709505944546686, 0.4)
(511, 0.6, 0.9709505944546686, 0.4)
Current size of match and non-match training data sets: 34 / 51
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 511 weight vectors
- Estimated match proportion 0.400
Sample size for this cluster: 78
Farthest first selection of 78 weight vectors from 511 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.296, 0.600, 0.471, 0.600, 0.643] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 4 matches and 74 non-matches
Purity of oracle classification: 0.949
Entropy of oracle classification: 0.292
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)562_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 562), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)562_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1073
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1073 weight vectors
Containing 226 true matches and 847 true non-matches
(21.06% true matches)
Identified 1016 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 979 (96.36%)
2 : 34 (3.35%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1016 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 826
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1072
Number of unique weight vectors: 1016
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1016, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1016 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1016 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 929 weight vectors
Based on 30 matches and 57 non-matches
Classified 158 matches and 771 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(158, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(771, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 158 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 158 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 49 matches and 7 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)860_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 860), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)860_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 789
Number of entity ID pairs that occurred more than once: 0
Analyse set of 789 weight vectors
Containing 225 true matches and 564 true non-matches
(28.52% true matches)
Identified 750 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 731 (97.47%)
2 : 16 (2.13%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 750 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 561
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 788
Number of unique weight vectors: 750
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(750, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 750 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 750 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 34 matches and 51 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 665 weight vectors
Based on 34 matches and 51 non-matches
Classified 153 matches and 512 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6, 0.9709505944546686, 0.4)
(512, 0.6, 0.9709505944546686, 0.4)
Current size of match and non-match training data sets: 34 / 51
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 512 weight vectors
- Estimated match proportion 0.400
Sample size for this cluster: 78
Farthest first selection of 78 weight vectors from 512 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.870, 0.619, 0.643, 0.700, 0.524] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.538, 0.613, 0.789, 0.227, 0.857] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 4 matches and 74 non-matches
Purity of oracle classification: 0.949
Entropy of oracle classification: 0.292
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)718_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 718), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)718_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 895
Number of entity ID pairs that occurred more than once: 0
Analyse set of 895 weight vectors
Containing 198 true matches and 697 true non-matches
(22.12% true matches)
Identified 850 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 816 (96.00%)
2 : 31 (3.65%)
3 : 2 (0.24%)
11 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 850 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 676
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 894
Number of unique weight vectors: 850
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(850, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 850 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 850 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 31 matches and 55 non-matches
Purity of oracle classification: 0.640
Entropy of oracle classification: 0.943
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 764 weight vectors
Based on 31 matches and 55 non-matches
Classified 194 matches and 570 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(194, 0.6395348837209303, 0.9430685934712908, 0.36046511627906974)
(570, 0.6395348837209303, 0.9430685934712908, 0.36046511627906974)
Current size of match and non-match training data sets: 31 / 55
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 194 weight vectors
- Estimated match proportion 0.360
Sample size for this cluster: 61
Farthest first selection of 61 weight vectors from 194 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.619, 1.000, 0.103, 0.163, 0.129, 0.146, 0.213] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 61 weight vectors
The oracle will correctly classify 61 weight vectors and wrongly classify 0
Classified 41 matches and 20 non-matches
Purity of oracle classification: 0.672
Entropy of oracle classification: 0.913
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 20
Number of false non-matches: 0
Deleted 61 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)221_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 221), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)221_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 953
Number of entity ID pairs that occurred more than once: 0
Analyse set of 953 weight vectors
Containing 201 true matches and 752 true non-matches
(21.09% true matches)
Identified 908 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 874 (96.26%)
2 : 31 (3.41%)
3 : 2 (0.22%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 908 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 952
Number of unique weight vectors: 908
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(908, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 908 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 908 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 821 weight vectors
Based on 24 matches and 63 non-matches
Classified 115 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(115, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)378_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 378), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)378_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)79_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 79), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)79_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 566 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)966_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 966), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)966_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)435_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 435), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)435_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)659_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 659), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)659_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 749
Number of entity ID pairs that occurred more than once: 0
Analyse set of 749 weight vectors
Containing 194 true matches and 555 true non-matches
(25.90% true matches)
Identified 707 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 672 (95.05%)
2 : 32 (4.53%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 707 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.000 : 535
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 749
Number of unique weight vectors: 707
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(707, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 707 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 707 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 623 weight vectors
Based on 31 matches and 53 non-matches
Classified 286 matches and 337 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(286, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(337, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 337 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 337 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.538, 0.500, 0.818, 0.789, 0.750] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.750, 0.778, 0.471, 0.727, 0.684] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.833, 0.571, 0.727, 0.647, 0.857] (False)
[1.000, 0.000, 0.857, 0.286, 0.500, 0.643, 0.600] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.917, 0.786, 0.263, 0.500, 0.556] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[0.800, 0.000, 0.625, 0.571, 0.467, 0.474, 0.667] (False)
[1.000, 0.000, 0.423, 0.478, 0.500, 0.813, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.333, 0.214, 0.333, 0.588, 0.476] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.583, 0.389, 0.471, 0.545, 0.474] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.385, 0.391, 0.667, 0.579, 0.824] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.000, 0.700, 0.818, 0.444, 0.619] (False)
[1.000, 0.000, 0.857, 0.444, 0.556, 0.235, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.750, 0.533, 0.294, 0.333, 0.429] (False)
[1.000, 0.000, 0.333, 0.750, 0.667, 0.667, 0.571] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.067, 0.550, 0.818, 0.727, 0.762] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(20)213_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 213), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)213_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)169_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 169), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)169_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 983
Number of entity ID pairs that occurred more than once: 0
Analyse set of 983 weight vectors
Containing 198 true matches and 785 true non-matches
(20.14% true matches)
Identified 941 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 906 (96.28%)
2 : 32 (3.40%)
3 : 2 (0.21%)
7 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 941 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 765
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 983
Number of unique weight vectors: 941
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(941, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 941 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 941 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 29 matches and 58 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 854 weight vectors
Based on 29 matches and 58 non-matches
Classified 144 matches and 710 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(710, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 29 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 710 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 710 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 3 matches and 73 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.240
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)906_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 906), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)906_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 950 non-matches
46.0
Analisando o arquivo: diverg(20)604_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 604), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)604_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 226 true matches and 857 true non-matches
(20.87% true matches)
Identified 1026 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 989 (96.39%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1026 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1026
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1026, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1026 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1026 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 29 matches and 59 non-matches
Purity of oracle classification: 0.670
Entropy of oracle classification: 0.914
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 938 weight vectors
Based on 29 matches and 59 non-matches
Classified 159 matches and 779 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
(779, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
Current size of match and non-match training data sets: 29 / 59
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 779 weight vectors
- Estimated match proportion 0.330
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 779 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.667, 0.500, 0.647, 0.556, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.750, 0.429, 0.526, 0.500, 0.846] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.462, 0.889, 0.455, 0.211, 0.375] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.412, 0.318, 0.421] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.233, 0.545, 0.714, 0.455, 0.238] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 2 matches and 74 non-matches
Purity of oracle classification: 0.974
Entropy of oracle classification: 0.176
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)205_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 205), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)205_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 69 matches and 842 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(69, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(842, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 69 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 37
Farthest first selection of 37 weight vectors from 69 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.933, 1.000, 0.952, 1.000, 1.000, 0.944, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.833, 1.000, 1.000, 0.935] (True)
[1.000, 1.000, 1.000, 1.000, 0.950, 0.923, 0.941] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.958, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
Perform oracle with 100.00 accuracy on 37 weight vectors
The oracle will correctly classify 37 weight vectors and wrongly classify 0
Classified 37 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 37 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)302_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 302), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)302_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1046
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1046 weight vectors
Containing 225 true matches and 821 true non-matches
(21.51% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 952 (96.26%)
2 : 34 (3.44%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 800
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1045
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 32 matches and 55 non-matches
Classified 330 matches and 572 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(330, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(572, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 330 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 330 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.909, 1.000, 1.000, 1.000, 0.947] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 41 matches and 29 non-matches
Purity of oracle classification: 0.586
Entropy of oracle classification: 0.979
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 29
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)747_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 747), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)747_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 733
Number of entity ID pairs that occurred more than once: 0
Analyse set of 733 weight vectors
Containing 210 true matches and 523 true non-matches
(28.65% true matches)
Identified 699 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 682 (97.57%)
2 : 14 (2.00%)
3 : 2 (0.29%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 699 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 520
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 732
Number of unique weight vectors: 699
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(699, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 699 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 699 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 30 matches and 54 non-matches
Purity of oracle classification: 0.643
Entropy of oracle classification: 0.940
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 615 weight vectors
Based on 30 matches and 54 non-matches
Classified 141 matches and 474 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
(474, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
Current size of match and non-match training data sets: 30 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 141 weight vectors
- Estimated match proportion 0.357
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 141 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 50 matches and 4 non-matches
Purity of oracle classification: 0.926
Entropy of oracle classification: 0.381
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)55_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 55), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)55_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 700
Number of entity ID pairs that occurred more than once: 0
Analyse set of 700 weight vectors
Containing 214 true matches and 486 true non-matches
(30.57% true matches)
Identified 665 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 650 (97.74%)
2 : 12 (1.80%)
3 : 2 (0.30%)
20 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 665 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 485
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 699
Number of unique weight vectors: 665
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(665, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 665 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 665 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 581 weight vectors
Based on 31 matches and 53 non-matches
Classified 314 matches and 267 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(314, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(267, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 314 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 314 vectors
The selected farthest weight vectors are:
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.890, 1.000, 0.281, 0.136, 0.183, 0.250, 0.163] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 42 matches and 28 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)650_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 650), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)650_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 708
Number of entity ID pairs that occurred more than once: 0
Analyse set of 708 weight vectors
Containing 196 true matches and 512 true non-matches
(27.68% true matches)
Identified 684 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 667 (97.51%)
2 : 14 (2.05%)
3 : 2 (0.29%)
7 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 684 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 510
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 708
Number of unique weight vectors: 684
Time to load and analyse the weight vector file: 0.04 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(684, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 684 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 684 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 33 matches and 51 non-matches
Purity of oracle classification: 0.607
Entropy of oracle classification: 0.967
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 600 weight vectors
Based on 33 matches and 51 non-matches
Classified 139 matches and 461 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(139, 0.6071428571428571, 0.9666186325481028, 0.39285714285714285)
(461, 0.6071428571428571, 0.9666186325481028, 0.39285714285714285)
Current size of match and non-match training data sets: 33 / 51
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.97
- Size 461 weight vectors
- Estimated match proportion 0.393
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 461 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 3 matches and 73 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.240
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)790_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981481
recall 0.177258
f-measure 0.300283
da 54
dm 0
ndm 0
tp 53
fp 1
tn 4.76529e+07
fn 246
Name: (10, 1 - acm diverg, 790), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)790_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 758
Number of entity ID pairs that occurred more than once: 0
Analyse set of 758 weight vectors
Containing 208 true matches and 550 true non-matches
(27.44% true matches)
Identified 722 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 705 (97.65%)
2 : 14 (1.94%)
3 : 2 (0.28%)
19 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 722 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 547
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 757
Number of unique weight vectors: 722
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(722, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 722 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 722 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 33 matches and 52 non-matches
Purity of oracle classification: 0.612
Entropy of oracle classification: 0.964
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 637 weight vectors
Based on 33 matches and 52 non-matches
Classified 307 matches and 330 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(307, 0.611764705882353, 0.9636512739945753, 0.38823529411764707)
(330, 0.611764705882353, 0.9636512739945753, 0.38823529411764707)
Current size of match and non-match training data sets: 33 / 52
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 307 weight vectors
- Estimated match proportion 0.388
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 307 vectors
The selected farthest weight vectors are:
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.890, 1.000, 0.281, 0.136, 0.183, 0.250, 0.163] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 43 matches and 27 non-matches
Purity of oracle classification: 0.614
Entropy of oracle classification: 0.962
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 27
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
54.0
Analisando o arquivo: diverg(10)681_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 681), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)681_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 767
Number of entity ID pairs that occurred more than once: 0
Analyse set of 767 weight vectors
Containing 196 true matches and 571 true non-matches
(25.55% true matches)
Identified 725 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 690 (95.17%)
2 : 32 (4.41%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 725 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 551
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 767
Number of unique weight vectors: 725
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(725, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 725 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 725 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 25 matches and 60 non-matches
Purity of oracle classification: 0.706
Entropy of oracle classification: 0.874
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 640 weight vectors
Based on 25 matches and 60 non-matches
Classified 98 matches and 542 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(98, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
(542, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
Current size of match and non-match training data sets: 25 / 60
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 98 weight vectors
- Estimated match proportion 0.294
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 98 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.420, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 42 matches and 2 non-matches
Purity of oracle classification: 0.955
Entropy of oracle classification: 0.267
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)517_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 517), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)517_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1092
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1092 weight vectors
Containing 226 true matches and 866 true non-matches
(20.70% true matches)
Identified 1035 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 998 (96.43%)
2 : 34 (3.29%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1035 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1091
Number of unique weight vectors: 1035
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1035, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1035 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1035 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 947 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 816 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(816, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 816 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 816 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 11 matches and 60 non-matches
Purity of oracle classification: 0.845
Entropy of oracle classification: 0.622
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)772_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 772), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)772_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 528
Number of entity ID pairs that occurred more than once: 0
Analyse set of 528 weight vectors
Containing 224 true matches and 304 true non-matches
(42.42% true matches)
Identified 489 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 470 (96.11%)
2 : 16 (3.27%)
3 : 2 (0.41%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 489 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 301
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 527
Number of unique weight vectors: 489
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(489, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 489 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 489 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 36 matches and 44 non-matches
Purity of oracle classification: 0.550
Entropy of oracle classification: 0.993
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 409 weight vectors
Based on 36 matches and 44 non-matches
Classified 208 matches and 201 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(208, 0.55, 0.9927744539878084, 0.45)
(201, 0.55, 0.9927744539878084, 0.45)
Current size of match and non-match training data sets: 36 / 44
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 208 weight vectors
- Estimated match proportion 0.450
Sample size for this cluster: 65
Farthest first selection of 65 weight vectors from 208 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.261, 0.174, 0.148, 0.186, 0.148] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.750, 1.000, 0.214, 0.184, 0.250, 0.267, 0.111] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 45 matches and 20 non-matches
Purity of oracle classification: 0.692
Entropy of oracle classification: 0.890
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 20
Number of false non-matches: 0
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)897_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 897), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)897_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1081
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1081 weight vectors
Containing 209 true matches and 872 true non-matches
(19.33% true matches)
Identified 1034 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1034 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 851
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1080
Number of unique weight vectors: 1034
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1034, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1034 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1034 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 946 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 845 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(845, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)932_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 932), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)932_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 830
Number of entity ID pairs that occurred more than once: 0
Analyse set of 830 weight vectors
Containing 207 true matches and 623 true non-matches
(24.94% true matches)
Identified 783 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 748 (95.53%)
2 : 32 (4.09%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 783 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 602
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 829
Number of unique weight vectors: 783
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(783, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 783 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 783 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 24 matches and 61 non-matches
Purity of oracle classification: 0.718
Entropy of oracle classification: 0.859
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 698 weight vectors
Based on 24 matches and 61 non-matches
Classified 97 matches and 601 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(97, 0.7176470588235294, 0.8586370819183629, 0.2823529411764706)
(601, 0.7176470588235294, 0.8586370819183629, 0.2823529411764706)
Current size of match and non-match training data sets: 24 / 61
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 601 weight vectors
- Estimated match proportion 0.282
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 601 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.423, 0.478, 0.500, 0.813, 0.545] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 17 matches and 52 non-matches
Purity of oracle classification: 0.754
Entropy of oracle classification: 0.805
Number of true matches: 17
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)473_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 473), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)473_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 671
Number of entity ID pairs that occurred more than once: 0
Analyse set of 671 weight vectors
Containing 199 true matches and 472 true non-matches
(29.66% true matches)
Identified 626 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 592 (94.57%)
2 : 31 (4.95%)
3 : 2 (0.32%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 626 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 451
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 670
Number of unique weight vectors: 626
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(626, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 626 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 626 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 25 matches and 58 non-matches
Purity of oracle classification: 0.699
Entropy of oracle classification: 0.883
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 543 weight vectors
Based on 25 matches and 58 non-matches
Classified 142 matches and 401 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6987951807228916, 0.8827586787955115, 0.30120481927710846)
(401, 0.6987951807228916, 0.8827586787955115, 0.30120481927710846)
Current size of match and non-match training data sets: 25 / 58
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 142 weight vectors
- Estimated match proportion 0.301
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)498_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984848
recall 0.217391
f-measure 0.356164
da 66
dm 0
ndm 0
tp 65
fp 1
tn 4.76529e+07
fn 234
Name: (10, 1 - acm diverg, 498), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)498_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 224
Number of entity ID pairs that occurred more than once: 0
Analyse set of 224 weight vectors
Containing 176 true matches and 48 true non-matches
(78.57% true matches)
Identified 199 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 188 (94.47%)
2 : 8 (4.02%)
3 : 2 (1.01%)
14 : 1 (0.50%)
Identified 1 non-pure unique weight vectors (from 199 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 151
0.929 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 47
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 223
Number of unique weight vectors: 199
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(199, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 199 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 65
Perform initial selection using "far" method
Farthest first selection of 65 weight vectors from 199 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 36 matches and 29 non-matches
Purity of oracle classification: 0.554
Entropy of oracle classification: 0.992
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 29
Number of false non-matches: 0
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 134 weight vectors
Based on 36 matches and 29 non-matches
Classified 125 matches and 9 non-matches
Non-match cluster not large enough for required sample size
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 1
Number of manual oracle classifications performed: 65
Size, purity, entropy, and estimated match proportion of clusters in queue:
(125, 0.5538461538461539, 0.9916178297881032, 0.5538461538461539)
Current size of match and non-match training data sets: 36 / 29
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 125 weight vectors
- Estimated match proportion 0.554
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 125 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 45 matches and 9 non-matches
Purity of oracle classification: 0.833
Entropy of oracle classification: 0.650
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
66.0
Analisando o arquivo: diverg(10)82_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 82), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)82_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 770
Number of entity ID pairs that occurred more than once: 0
Analyse set of 770 weight vectors
Containing 212 true matches and 558 true non-matches
(27.53% true matches)
Identified 718 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 683 (95.13%)
2 : 32 (4.46%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 718 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 537
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 769
Number of unique weight vectors: 718
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(718, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 718 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 718 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 634 weight vectors
Based on 31 matches and 53 non-matches
Classified 293 matches and 341 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(293, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(341, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 341 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 341 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.367, 0.667, 0.583, 0.625, 0.316] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.333, 0.214, 0.333, 0.588, 0.476] (False)
[1.000, 0.000, 0.438, 0.500, 0.467, 0.529, 0.611] (False)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[0.857, 0.000, 0.500, 0.389, 0.235, 0.045, 0.526] (False)
[1.000, 0.000, 0.476, 0.179, 0.500, 0.412, 0.357] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.833, 0.571, 0.727, 0.647, 0.857] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.583, 0.875, 0.727, 0.833, 0.643] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)614_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 614), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)614_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 30 matches and 58 non-matches
Classified 179 matches and 760 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(179, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(760, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 760 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 760 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)300_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 300), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)300_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 799
Number of entity ID pairs that occurred more than once: 0
Analyse set of 799 weight vectors
Containing 224 true matches and 575 true non-matches
(28.04% true matches)
Identified 760 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 741 (97.50%)
2 : 16 (2.11%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 760 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 572
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 798
Number of unique weight vectors: 760
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(760, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 760 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 760 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 675 weight vectors
Based on 29 matches and 56 non-matches
Classified 149 matches and 526 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(526, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 149 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 49 matches and 6 non-matches
Purity of oracle classification: 0.891
Entropy of oracle classification: 0.497
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)448_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 448), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)448_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 529
Number of entity ID pairs that occurred more than once: 0
Analyse set of 529 weight vectors
Containing 225 true matches and 304 true non-matches
(42.53% true matches)
Identified 490 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 471 (96.12%)
2 : 16 (3.27%)
3 : 2 (0.41%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 490 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 301
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 528
Number of unique weight vectors: 490
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(490, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 490 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 490 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 36 matches and 44 non-matches
Purity of oracle classification: 0.550
Entropy of oracle classification: 0.993
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 410 weight vectors
Based on 36 matches and 44 non-matches
Classified 173 matches and 237 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(173, 0.55, 0.9927744539878084, 0.45)
(237, 0.55, 0.9927744539878084, 0.45)
Current size of match and non-match training data sets: 36 / 44
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 237 weight vectors
- Estimated match proportion 0.450
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 237 vectors
The selected farthest weight vectors are:
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.818, 0.727, 0.438, 0.375, 0.400] (False)
[1.000, 0.000, 0.800, 0.636, 0.563, 0.545, 0.722] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 1 matches and 67 non-matches
Purity of oracle classification: 0.985
Entropy of oracle classification: 0.111
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)65_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 65), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)65_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 845
Number of entity ID pairs that occurred more than once: 0
Analyse set of 845 weight vectors
Containing 227 true matches and 618 true non-matches
(26.86% true matches)
Identified 788 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 751 (95.30%)
2 : 34 (4.31%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 788 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 597
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 844
Number of unique weight vectors: 788
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(788, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 788 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 788 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 703 weight vectors
Based on 29 matches and 56 non-matches
Classified 162 matches and 541 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(541, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 162 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 162 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)954_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 954), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)954_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 861
Number of entity ID pairs that occurred more than once: 0
Analyse set of 861 weight vectors
Containing 227 true matches and 634 true non-matches
(26.36% true matches)
Identified 804 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 767 (95.40%)
2 : 34 (4.23%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 804 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 860
Number of unique weight vectors: 804
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(804, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 804 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 804 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 718 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 565 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(565, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)310_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 310), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)310_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 902
Number of entity ID pairs that occurred more than once: 0
Analyse set of 902 weight vectors
Containing 178 true matches and 724 true non-matches
(19.73% true matches)
Identified 863 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 833 (96.52%)
2 : 27 (3.13%)
3 : 2 (0.23%)
9 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 863 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 703
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 893
Number of unique weight vectors: 862
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(862, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 862 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 862 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 776 weight vectors
Based on 24 matches and 62 non-matches
Classified 94 matches and 682 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(682, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 94 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 94 vectors
The selected farthest weight vectors are:
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 43 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(15)63_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (15, 1 - acm diverg, 63), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)63_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 562
Number of entity ID pairs that occurred more than once: 0
Analyse set of 562 weight vectors
Containing 173 true matches and 389 true non-matches
(30.78% true matches)
Identified 544 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 535 (98.35%)
2 : 6 (1.10%)
3 : 2 (0.37%)
9 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 544 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 155
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 388
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 553
Number of unique weight vectors: 543
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(543, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 543 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 543 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 26 matches and 55 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.905
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 462 weight vectors
Based on 26 matches and 55 non-matches
Classified 114 matches and 348 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6790123456790124, 0.9054522631867894, 0.32098765432098764)
(348, 0.6790123456790124, 0.9054522631867894, 0.32098765432098764)
Current size of match and non-match training data sets: 26 / 55
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 348 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 67
Farthest first selection of 67 weight vectors from 348 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.455, 0.714, 0.429, 0.550, 0.647] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 11 matches and 56 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.644
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(10)711_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 711), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)711_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 645
Number of entity ID pairs that occurred more than once: 0
Analyse set of 645 weight vectors
Containing 197 true matches and 448 true non-matches
(30.54% true matches)
Identified 590 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 554 (93.90%)
2 : 33 (5.59%)
3 : 2 (0.34%)
19 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 590 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 162
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 427
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 644
Number of unique weight vectors: 590
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(590, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 590 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 590 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 28 matches and 54 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 508 weight vectors
Based on 28 matches and 54 non-matches
Classified 181 matches and 327 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(181, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
(327, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
Current size of match and non-match training data sets: 28 / 54
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 181 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 59
Farthest first selection of 59 weight vectors from 181 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.530, 1.000, 0.159, 0.086, 0.182, 0.159, 0.163] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 59 weight vectors
The oracle will correctly classify 59 weight vectors and wrongly classify 0
Classified 42 matches and 17 non-matches
Purity of oracle classification: 0.712
Entropy of oracle classification: 0.866
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 17
Number of false non-matches: 0
Deleted 59 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)784_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990385
recall 0.344482
f-measure 0.511166
da 104
dm 0
ndm 0
tp 103
fp 1
tn 4.76529e+07
fn 196
Name: (10, 1 - acm diverg, 784), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)784_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 305
Number of entity ID pairs that occurred more than once: 0
Analyse set of 305 weight vectors
Containing 146 true matches and 159 true non-matches
(47.87% true matches)
Identified 288 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 279 (96.88%)
2 : 6 (2.08%)
3 : 2 (0.69%)
8 : 1 (0.35%)
Identified 1 non-pure unique weight vectors (from 288 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 131
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 156
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 297
Number of unique weight vectors: 287
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(287, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 287 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 72
Perform initial selection using "far" method
Farthest first selection of 72 weight vectors from 287 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 34 matches and 38 non-matches
Purity of oracle classification: 0.528
Entropy of oracle classification: 0.998
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 215 weight vectors
Based on 34 matches and 38 non-matches
Classified 130 matches and 85 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 72
Size, purity, entropy, and estimated match proportion of clusters in queue:
(130, 0.5277777777777778, 0.9977724720899821, 0.4722222222222222)
(85, 0.5277777777777778, 0.9977724720899821, 0.4722222222222222)
Current size of match and non-match training data sets: 34 / 38
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 130 weight vectors
- Estimated match proportion 0.472
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 130 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.619, 1.000, 0.103, 0.163, 0.129, 0.146, 0.213] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.800, 1.000, 0.167, 0.180, 0.151, 0.147, 0.203] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 37 matches and 18 non-matches
Purity of oracle classification: 0.673
Entropy of oracle classification: 0.912
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 18
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
104.0
Analisando o arquivo: diverg(10)609_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.980583
recall 0.337793
f-measure 0.502488
da 103
dm 0
ndm 0
tp 101
fp 2
tn 4.76529e+07
fn 198
Name: (10, 1 - acm diverg, 609), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)609_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 237
Number of entity ID pairs that occurred more than once: 0
Analyse set of 237 weight vectors
Containing 138 true matches and 99 true non-matches
(58.23% true matches)
Identified 222 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 211 (95.05%)
2 : 8 (3.60%)
3 : 2 (0.90%)
4 : 1 (0.45%)
Identified 0 non-pure unique weight vectors (from 222 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 125
0.000 : 97
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 237
Number of unique weight vectors: 222
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(222, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 222 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 67
Perform initial selection using "far" method
Farthest first selection of 67 weight vectors from 222 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.344, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 30 matches and 37 non-matches
Purity of oracle classification: 0.552
Entropy of oracle classification: 0.992
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 37
Number of false non-matches: 0
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 155 weight vectors
Based on 30 matches and 37 non-matches
Classified 101 matches and 54 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 67
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.5522388059701493, 0.99211169200215, 0.44776119402985076)
(54, 0.5522388059701493, 0.99211169200215, 0.44776119402985076)
Current size of match and non-match training data sets: 30 / 37
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 101 weight vectors
- Estimated match proportion 0.448
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 43 matches and 6 non-matches
Purity of oracle classification: 0.878
Entropy of oracle classification: 0.536
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)726_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990566
recall 0.351171
f-measure 0.518519
da 106
dm 0
ndm 0
tp 105
fp 1
tn 4.76529e+07
fn 194
Name: (10, 1 - acm diverg, 726), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)726_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 634
Number of entity ID pairs that occurred more than once: 0
Analyse set of 634 weight vectors
Containing 154 true matches and 480 true non-matches
(24.29% true matches)
Identified 598 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 570 (95.32%)
2 : 25 (4.18%)
3 : 2 (0.33%)
8 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 598 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 459
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 626
Number of unique weight vectors: 597
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(597, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 597 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 597 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 26 matches and 57 non-matches
Purity of oracle classification: 0.687
Entropy of oracle classification: 0.897
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 514 weight vectors
Based on 26 matches and 57 non-matches
Classified 89 matches and 425 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(89, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
(425, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
Current size of match and non-match training data sets: 26 / 57
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 425 weight vectors
- Estimated match proportion 0.313
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 425 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
106.0
Analisando o arquivo: diverg(15)970_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 970), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)970_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1064
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1064 weight vectors
Containing 219 true matches and 845 true non-matches
(20.58% true matches)
Identified 1008 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 972 (96.43%)
2 : 33 (3.27%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1008 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 824
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1063
Number of unique weight vectors: 1008
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1008, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1008 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1008 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 921 weight vectors
Based on 31 matches and 56 non-matches
Classified 325 matches and 596 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(325, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(596, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 325 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 325 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 41 matches and 28 non-matches
Purity of oracle classification: 0.594
Entropy of oracle classification: 0.974
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)887_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 887), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)887_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 29 matches and 56 non-matches
Classified 146 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(538, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 538 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 538 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.407, 0.643, 0.667, 0.500, 0.563] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 9 matches and 65 non-matches
Purity of oracle classification: 0.878
Entropy of oracle classification: 0.534
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)814_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.977273
recall 0.431438
f-measure 0.598608
da 132
dm 0
ndm 0
tp 129
fp 3
tn 4.76529e+07
fn 170
Name: (10, 1 - acm diverg, 814), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)814_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 784
Number of entity ID pairs that occurred more than once: 0
Analyse set of 784 weight vectors
Containing 127 true matches and 657 true non-matches
(16.20% true matches)
Identified 753 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 725 (96.28%)
2 : 25 (3.32%)
3 : 3 (0.40%)
Identified 0 non-pure unique weight vectors (from 753 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 116
0.000 : 637
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 784
Number of unique weight vectors: 753
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(753, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 753 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 753 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 668 weight vectors
Based on 26 matches and 59 non-matches
Classified 113 matches and 555 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(113, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(555, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 555 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 555 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
132.0
Analisando o arquivo: diverg(15)847_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 847), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)847_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 597
Number of entity ID pairs that occurred more than once: 0
Analyse set of 597 weight vectors
Containing 214 true matches and 383 true non-matches
(35.85% true matches)
Identified 563 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 548 (97.34%)
2 : 12 (2.13%)
3 : 2 (0.36%)
19 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 563 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 382
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 596
Number of unique weight vectors: 563
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(563, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 563 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 563 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 27 matches and 55 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 481 weight vectors
Based on 27 matches and 55 non-matches
Classified 142 matches and 339 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
(339, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
Current size of match and non-match training data sets: 27 / 55
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 142 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 49 matches and 4 non-matches
Purity of oracle classification: 0.925
Entropy of oracle classification: 0.386
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)627_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 627), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)627_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 970
Number of entity ID pairs that occurred more than once: 0
Analyse set of 970 weight vectors
Containing 219 true matches and 751 true non-matches
(22.58% true matches)
Identified 915 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 879 (96.07%)
2 : 33 (3.61%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 915 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 730
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 969
Number of unique weight vectors: 915
Time to load and analyse the weight vector file: 0.05 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(915, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 915 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 915 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 828 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 705 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(705, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 705 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 705 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)319_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 319), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)319_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 605
Number of entity ID pairs that occurred more than once: 0
Analyse set of 605 weight vectors
Containing 154 true matches and 451 true non-matches
(25.45% true matches)
Identified 569 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 541 (95.08%)
2 : 25 (4.39%)
3 : 2 (0.35%)
8 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 569 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 430
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 597
Number of unique weight vectors: 568
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(568, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 568 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 568 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 25 matches and 57 non-matches
Purity of oracle classification: 0.695
Entropy of oracle classification: 0.887
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 486 weight vectors
Based on 25 matches and 57 non-matches
Classified 155 matches and 331 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6951219512195121, 0.8871723027673717, 0.3048780487804878)
(331, 0.6951219512195121, 0.8871723027673717, 0.3048780487804878)
Current size of match and non-match training data sets: 25 / 57
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.89
- Size 331 weight vectors
- Estimated match proportion 0.305
Sample size for this cluster: 65
Farthest first selection of 65 weight vectors from 331 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.269, 0.478, 0.750, 0.385, 0.455] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.538, 0.600, 0.471, 0.632, 0.688] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.500, 0.571, 0.467, 0.467, 0.389] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.179, 0.500, 0.412, 0.357] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.800, 0.667, 0.381, 0.550, 0.429] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.571, 0.286, 0.333, 0.571, 0.600] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 0 matches and 65 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)707_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990385
recall 0.344482
f-measure 0.511166
da 104
dm 0
ndm 0
tp 103
fp 1
tn 4.76529e+07
fn 196
Name: (10, 1 - acm diverg, 707), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)707_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 343
Number of entity ID pairs that occurred more than once: 0
Analyse set of 343 weight vectors
Containing 154 true matches and 189 true non-matches
(44.90% true matches)
Identified 325 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 315 (96.92%)
2 : 7 (2.15%)
3 : 2 (0.62%)
8 : 1 (0.31%)
Identified 1 non-pure unique weight vectors (from 325 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 186
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 335
Number of unique weight vectors: 324
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(324, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 324 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 74
Perform initial selection using "far" method
Farthest first selection of 74 weight vectors from 324 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 28 matches and 46 non-matches
Purity of oracle classification: 0.622
Entropy of oracle classification: 0.957
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 46
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 250 weight vectors
Based on 28 matches and 46 non-matches
Classified 99 matches and 151 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 74
Size, purity, entropy, and estimated match proportion of clusters in queue:
(99, 0.6216216216216216, 0.9568886656798214, 0.3783783783783784)
(151, 0.6216216216216216, 0.9568886656798214, 0.3783783783783784)
Current size of match and non-match training data sets: 28 / 46
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 151 weight vectors
- Estimated match proportion 0.378
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 11 matches and 46 non-matches
Purity of oracle classification: 0.807
Entropy of oracle classification: 0.708
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 46
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
104.0
Analisando o arquivo: diverg(20)124_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 124), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)124_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)815_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 815), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)815_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 396
Number of entity ID pairs that occurred more than once: 0
Analyse set of 396 weight vectors
Containing 216 true matches and 180 true non-matches
(54.55% true matches)
Identified 363 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 347 (95.59%)
2 : 13 (3.58%)
3 : 2 (0.55%)
17 : 1 (0.28%)
Identified 1 non-pure unique weight vectors (from 363 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 179
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 395
Number of unique weight vectors: 363
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(363, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 363 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 363 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 42 matches and 34 non-matches
Purity of oracle classification: 0.553
Entropy of oracle classification: 0.992
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 34
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 287 weight vectors
Based on 42 matches and 34 non-matches
Classified 146 matches and 141 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.5526315789473685, 0.9919924034538556, 0.5526315789473685)
(141, 0.5526315789473685, 0.9919924034538556, 0.5526315789473685)
Current size of match and non-match training data sets: 42 / 34
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 146 weight vectors
- Estimated match proportion 0.553
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 146 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 52 matches and 6 non-matches
Purity of oracle classification: 0.897
Entropy of oracle classification: 0.480
Number of true matches: 52
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)274_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984615
recall 0.214047
f-measure 0.351648
da 65
dm 0
ndm 0
tp 64
fp 1
tn 4.76529e+07
fn 235
Name: (10, 1 - acm diverg, 274), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)274_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 587
Number of entity ID pairs that occurred more than once: 0
Analyse set of 587 weight vectors
Containing 191 true matches and 396 true non-matches
(32.54% true matches)
Identified 558 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 543 (97.31%)
2 : 12 (2.15%)
3 : 2 (0.36%)
14 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 558 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 164
0.929 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 393
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 586
Number of unique weight vectors: 558
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(558, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 558 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 558 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 31 matches and 51 non-matches
Purity of oracle classification: 0.622
Entropy of oracle classification: 0.957
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 476 weight vectors
Based on 31 matches and 51 non-matches
Classified 136 matches and 340 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
(340, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
Current size of match and non-match training data sets: 31 / 51
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 340 weight vectors
- Estimated match proportion 0.378
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 340 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[0.790, 0.000, 0.636, 0.619, 0.429, 0.450, 0.609] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.778, 0.429, 0.571, 0.750, 0.600] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.481, 0.643, 0.667, 0.350, 0.643] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 3 matches and 68 non-matches
Purity of oracle classification: 0.958
Entropy of oracle classification: 0.253
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 68
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
65.0
Analisando o arquivo: diverg(10)346_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 346), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)346_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 730
Number of entity ID pairs that occurred more than once: 0
Analyse set of 730 weight vectors
Containing 220 true matches and 510 true non-matches
(30.14% true matches)
Identified 694 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 675 (97.26%)
2 : 16 (2.31%)
3 : 2 (0.29%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 694 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 507
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 729
Number of unique weight vectors: 694
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(694, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 694 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 694 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 33 matches and 51 non-matches
Purity of oracle classification: 0.607
Entropy of oracle classification: 0.967
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 610 weight vectors
Based on 33 matches and 51 non-matches
Classified 159 matches and 451 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6071428571428571, 0.9666186325481028, 0.39285714285714285)
(451, 0.6071428571428571, 0.9666186325481028, 0.39285714285714285)
Current size of match and non-match training data sets: 33 / 51
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.97
- Size 451 weight vectors
- Estimated match proportion 0.393
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 451 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 3 matches and 73 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.240
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)533_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 533), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)533_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 351
Number of entity ID pairs that occurred more than once: 0
Analyse set of 351 weight vectors
Containing 192 true matches and 159 true non-matches
(54.70% true matches)
Identified 319 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 304 (95.30%)
2 : 12 (3.76%)
3 : 2 (0.63%)
17 : 1 (0.31%)
Identified 1 non-pure unique weight vectors (from 319 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 162
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 156
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 350
Number of unique weight vectors: 319
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(319, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 319 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 74
Perform initial selection using "far" method
Farthest first selection of 74 weight vectors from 319 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 35 matches and 39 non-matches
Purity of oracle classification: 0.527
Entropy of oracle classification: 0.998
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 245 weight vectors
Based on 35 matches and 39 non-matches
Classified 149 matches and 96 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 74
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.527027027027027, 0.9978913098356863, 0.47297297297297297)
(96, 0.527027027027027, 0.9978913098356863, 0.47297297297297297)
Current size of match and non-match training data sets: 35 / 39
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 149 weight vectors
- Estimated match proportion 0.473
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.564, 1.000, 0.200, 0.170, 0.192, 0.176, 0.244] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.367, 1.000, 0.154, 0.174, 0.125, 0.240, 0.226] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.242, 0.121, 0.200, 0.171, 0.000] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 42 matches and 16 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 16
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(10)870_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 870), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)870_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 873
Number of entity ID pairs that occurred more than once: 0
Analyse set of 873 weight vectors
Containing 155 true matches and 718 true non-matches
(17.75% true matches)
Identified 837 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 809 (96.65%)
2 : 25 (2.99%)
3 : 2 (0.24%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 837 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 139
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 697
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 865
Number of unique weight vectors: 836
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(836, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 836 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 836 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 25 matches and 61 non-matches
Purity of oracle classification: 0.709
Entropy of oracle classification: 0.870
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 750 weight vectors
Based on 25 matches and 61 non-matches
Classified 88 matches and 662 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(88, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
(662, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
Current size of match and non-match training data sets: 25 / 61
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 662 weight vectors
- Estimated match proportion 0.291
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 662 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 13 matches and 58 non-matches
Purity of oracle classification: 0.817
Entropy of oracle classification: 0.687
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(15)501_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976
recall 0.408027
f-measure 0.575472
da 125
dm 0
ndm 0
tp 122
fp 3
tn 4.76529e+07
fn 177
Name: (15, 1 - acm diverg, 501), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)501_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 406
Number of entity ID pairs that occurred more than once: 0
Analyse set of 406 weight vectors
Containing 142 true matches and 264 true non-matches
(34.98% true matches)
Identified 390 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 379 (97.18%)
2 : 8 (2.05%)
3 : 2 (0.51%)
5 : 1 (0.26%)
Identified 0 non-pure unique weight vectors (from 390 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 128
0.000 : 262
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 406
Number of unique weight vectors: 390
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(390, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 390 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 390 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 32 matches and 45 non-matches
Purity of oracle classification: 0.584
Entropy of oracle classification: 0.979
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 313 weight vectors
Based on 32 matches and 45 non-matches
Classified 89 matches and 224 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(89, 0.5844155844155844, 0.9793399259567799, 0.4155844155844156)
(224, 0.5844155844155844, 0.9793399259567799, 0.4155844155844156)
Current size of match and non-match training data sets: 32 / 45
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 89 weight vectors
- Estimated match proportion 0.416
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 89 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 42 matches and 4 non-matches
Purity of oracle classification: 0.913
Entropy of oracle classification: 0.426
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
125.0
Analisando o arquivo: diverg(20)910_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 910), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)910_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)214_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 214), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)214_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 407
Number of entity ID pairs that occurred more than once: 0
Analyse set of 407 weight vectors
Containing 217 true matches and 190 true non-matches
(53.32% true matches)
Identified 370 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 352 (95.14%)
2 : 15 (4.05%)
3 : 2 (0.54%)
19 : 1 (0.27%)
Identified 1 non-pure unique weight vectors (from 370 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 187
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 406
Number of unique weight vectors: 370
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(370, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 370 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 370 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 29 matches and 47 non-matches
Purity of oracle classification: 0.618
Entropy of oracle classification: 0.959
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 47
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 294 weight vectors
Based on 29 matches and 47 non-matches
Classified 145 matches and 149 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.618421052631579, 0.959149554396894, 0.3815789473684211)
(149, 0.618421052631579, 0.959149554396894, 0.3815789473684211)
Current size of match and non-match training data sets: 29 / 47
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 149 weight vectors
- Estimated match proportion 0.382
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 9 matches and 48 non-matches
Purity of oracle classification: 0.842
Entropy of oracle classification: 0.629
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)798_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 798), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)798_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1050
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1050 weight vectors
Containing 208 true matches and 842 true non-matches
(19.81% true matches)
Identified 1003 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 968 (96.51%)
2 : 32 (3.19%)
3 : 2 (0.20%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1003 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 821
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1049
Number of unique weight vectors: 1003
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1003, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1003 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1003 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 916 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(793, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 123 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 46 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.149
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)705_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979592
recall 0.32107
f-measure 0.483627
da 98
dm 0
ndm 0
tp 96
fp 2
tn 4.76529e+07
fn 203
Name: (15, 1 - acm diverg, 705), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)705_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 678
Number of entity ID pairs that occurred more than once: 0
Analyse set of 678 weight vectors
Containing 167 true matches and 511 true non-matches
(24.63% true matches)
Identified 659 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 646 (98.03%)
2 : 10 (1.52%)
3 : 2 (0.30%)
6 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 659 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.000 : 509
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 678
Number of unique weight vectors: 659
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(659, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 659 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 659 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 32 matches and 52 non-matches
Purity of oracle classification: 0.619
Entropy of oracle classification: 0.959
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 575 weight vectors
Based on 32 matches and 52 non-matches
Classified 114 matches and 461 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
(461, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
Current size of match and non-match training data sets: 32 / 52
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 114 weight vectors
- Estimated match proportion 0.381
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 114 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 45 matches and 6 non-matches
Purity of oracle classification: 0.882
Entropy of oracle classification: 0.523
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
98.0
Analisando o arquivo: diverg(15)741_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 741), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)741_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 781
Number of entity ID pairs that occurred more than once: 0
Analyse set of 781 weight vectors
Containing 222 true matches and 559 true non-matches
(28.43% true matches)
Identified 727 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 690 (94.91%)
2 : 34 (4.68%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 727 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 538
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 780
Number of unique weight vectors: 727
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(727, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 727 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 727 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 32 matches and 53 non-matches
Purity of oracle classification: 0.624
Entropy of oracle classification: 0.956
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 642 weight vectors
Based on 32 matches and 53 non-matches
Classified 301 matches and 341 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(301, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
(341, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
Current size of match and non-match training data sets: 32 / 53
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 301 weight vectors
- Estimated match proportion 0.376
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 301 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 44 matches and 25 non-matches
Purity of oracle classification: 0.638
Entropy of oracle classification: 0.945
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)134_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 134), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)134_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 934
Number of entity ID pairs that occurred more than once: 0
Analyse set of 934 weight vectors
Containing 217 true matches and 717 true non-matches
(23.23% true matches)
Identified 879 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 843 (95.90%)
2 : 33 (3.75%)
3 : 2 (0.23%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 879 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 696
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 933
Number of unique weight vectors: 879
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(879, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 879 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 879 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 25 matches and 61 non-matches
Purity of oracle classification: 0.709
Entropy of oracle classification: 0.870
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 793 weight vectors
Based on 25 matches and 61 non-matches
Classified 133 matches and 660 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
(660, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
Current size of match and non-match training data sets: 25 / 61
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 133 weight vectors
- Estimated match proportion 0.291
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 133 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 49 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.141
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)8_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (15, 1 - acm diverg, 8), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)8_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 187 true matches and 865 true non-matches
(17.78% true matches)
Identified 1010 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 979 (96.93%)
2 : 28 (2.77%)
3 : 2 (0.20%)
11 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1010 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 844
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 1010
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1010, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1010 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1010 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 923 weight vectors
Based on 24 matches and 63 non-matches
Classified 86 matches and 837 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(86, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(837, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 86 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 41
Farthest first selection of 41 weight vectors from 86 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 41 weight vectors
The oracle will correctly classify 41 weight vectors and wrongly classify 0
Classified 41 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 41 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(20)890_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 890), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)890_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1073
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1073 weight vectors
Containing 226 true matches and 847 true non-matches
(21.06% true matches)
Identified 1016 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 979 (96.36%)
2 : 34 (3.35%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1016 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 826
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1072
Number of unique weight vectors: 1016
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1016, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1016 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1016 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 929 weight vectors
Based on 31 matches and 56 non-matches
Classified 332 matches and 597 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(332, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(597, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 597 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 597 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.692, 0.583, 0.500, 0.750, 0.731] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)841_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 841), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)841_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 295
Number of entity ID pairs that occurred more than once: 0
Analyse set of 295 weight vectors
Containing 186 true matches and 109 true non-matches
(63.05% true matches)
Identified 272 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 260 (95.59%)
2 : 9 (3.31%)
3 : 2 (0.74%)
11 : 1 (0.37%)
Identified 1 non-pure unique weight vectors (from 272 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 163
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 108
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 294
Number of unique weight vectors: 272
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(272, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 272 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 71
Perform initial selection using "far" method
Farthest first selection of 71 weight vectors from 272 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 34 matches and 37 non-matches
Purity of oracle classification: 0.521
Entropy of oracle classification: 0.999
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 37
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 201 weight vectors
Based on 34 matches and 37 non-matches
Classified 140 matches and 61 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 71
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.5211267605633803, 0.9987117514654895, 0.4788732394366197)
(61, 0.5211267605633803, 0.9987117514654895, 0.4788732394366197)
Current size of match and non-match training data sets: 34 / 37
Selected cluster with (queue ordering: random):
- Purity 0.52 and entropy 1.00
- Size 140 weight vectors
- Estimated match proportion 0.479
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 140 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 49 matches and 8 non-matches
Purity of oracle classification: 0.860
Entropy of oracle classification: 0.585
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)774_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 774), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)774_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)530_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 530), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)530_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1092
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1092 weight vectors
Containing 226 true matches and 866 true non-matches
(20.70% true matches)
Identified 1035 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 998 (96.43%)
2 : 34 (3.29%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1035 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1091
Number of unique weight vectors: 1035
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1035, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1035 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1035 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 947 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 816 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(816, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 131 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)665_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 665), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)665_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 845
Number of entity ID pairs that occurred more than once: 0
Analyse set of 845 weight vectors
Containing 227 true matches and 618 true non-matches
(26.86% true matches)
Identified 788 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 751 (95.30%)
2 : 34 (4.31%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 788 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 597
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 844
Number of unique weight vectors: 788
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(788, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 788 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 788 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 703 weight vectors
Based on 29 matches and 56 non-matches
Classified 162 matches and 541 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(541, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 162 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 162 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)423_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 423), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)423_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 30 matches and 58 non-matches
Classified 179 matches and 760 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(179, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(760, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 179 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 179 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 43 matches and 15 non-matches
Purity of oracle classification: 0.741
Entropy of oracle classification: 0.825
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 15
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)411_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984615
recall 0.214047
f-measure 0.351648
da 65
dm 0
ndm 0
tp 64
fp 1
tn 4.76529e+07
fn 235
Name: (10, 1 - acm diverg, 411), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)411_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 569
Number of entity ID pairs that occurred more than once: 0
Analyse set of 569 weight vectors
Containing 186 true matches and 383 true non-matches
(32.69% true matches)
Identified 540 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 525 (97.22%)
2 : 12 (2.22%)
3 : 2 (0.37%)
14 : 1 (0.19%)
Identified 1 non-pure unique weight vectors (from 540 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.929 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 380
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 568
Number of unique weight vectors: 540
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(540, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 540 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 540 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 30 matches and 51 non-matches
Purity of oracle classification: 0.630
Entropy of oracle classification: 0.951
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 459 weight vectors
Based on 30 matches and 51 non-matches
Classified 137 matches and 322 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(137, 0.6296296296296297, 0.9509560484549725, 0.37037037037037035)
(322, 0.6296296296296297, 0.9509560484549725, 0.37037037037037035)
Current size of match and non-match training data sets: 30 / 51
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 322 weight vectors
- Estimated match proportion 0.370
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 322 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[0.790, 0.000, 0.636, 0.619, 0.429, 0.450, 0.609] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.778, 0.429, 0.571, 0.750, 0.600] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.481, 0.643, 0.667, 0.350, 0.643] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 0 matches and 70 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
65.0
Analisando o arquivo: diverg(15)296_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.99
recall 0.331104
f-measure 0.496241
da 100
dm 0
ndm 0
tp 99
fp 1
tn 4.76529e+07
fn 200
Name: (15, 1 - acm diverg, 296), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)296_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1006
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1006 weight vectors
Containing 166 true matches and 840 true non-matches
(16.50% true matches)
Identified 967 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 938 (97.00%)
2 : 26 (2.69%)
3 : 2 (0.21%)
10 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 967 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 147
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 819
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1005
Number of unique weight vectors: 967
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(967, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 967 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 967 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 880 weight vectors
Based on 24 matches and 63 non-matches
Classified 83 matches and 797 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(83, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(797, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 83 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 40
Farthest first selection of 40 weight vectors from 83 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.929, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 40 weight vectors
The oracle will correctly classify 40 weight vectors and wrongly classify 0
Classified 39 matches and 1 non-matches
Purity of oracle classification: 0.975
Entropy of oracle classification: 0.169
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 40 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(10)958_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 958), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)958_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 695
Number of entity ID pairs that occurred more than once: 0
Analyse set of 695 weight vectors
Containing 202 true matches and 493 true non-matches
(29.06% true matches)
Identified 669 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 655 (97.91%)
2 : 11 (1.64%)
3 : 2 (0.30%)
12 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 669 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 492
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 694
Number of unique weight vectors: 669
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(669, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 669 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 669 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 32 matches and 52 non-matches
Purity of oracle classification: 0.619
Entropy of oracle classification: 0.959
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 585 weight vectors
Based on 32 matches and 52 non-matches
Classified 142 matches and 443 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
(443, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
Current size of match and non-match training data sets: 32 / 52
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 443 weight vectors
- Estimated match proportion 0.381
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 443 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.750, 0.667, 0.444, 0.765, 0.714] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.818, 0.762, 0.714, 0.500, 0.400] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.455, 0.714, 0.429, 0.550, 0.647] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 4 matches and 71 non-matches
Purity of oracle classification: 0.947
Entropy of oracle classification: 0.300
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)795_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 795), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)795_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1074
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1074 weight vectors
Containing 208 true matches and 866 true non-matches
(19.37% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 992 (96.59%)
2 : 32 (3.12%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1073
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 25 matches and 63 non-matches
Classified 121 matches and 818 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(121, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(818, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 121 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 121 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 46 matches and 2 non-matches
Purity of oracle classification: 0.958
Entropy of oracle classification: 0.250
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)519_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 519), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)519_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 680
Number of entity ID pairs that occurred more than once: 0
Analyse set of 680 weight vectors
Containing 198 true matches and 482 true non-matches
(29.12% true matches)
Identified 635 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 601 (94.65%)
2 : 31 (4.88%)
3 : 2 (0.31%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 635 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 461
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 679
Number of unique weight vectors: 635
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(635, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 635 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 635 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 30 matches and 53 non-matches
Purity of oracle classification: 0.639
Entropy of oracle classification: 0.944
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 552 weight vectors
Based on 30 matches and 53 non-matches
Classified 194 matches and 358 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(194, 0.6385542168674698, 0.943876757128791, 0.3614457831325301)
(358, 0.6385542168674698, 0.943876757128791, 0.3614457831325301)
Current size of match and non-match training data sets: 30 / 53
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 358 weight vectors
- Estimated match proportion 0.361
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 358 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.700, 0.429, 0.476, 0.647, 0.810] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.857, 0.875, 0.625, 0.333, 0.667] (False)
[1.000, 0.000, 0.741, 0.556, 0.667, 0.350, 0.556] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.917, 0.786, 0.263, 0.500, 0.556] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.407, 0.643, 0.667, 0.500, 0.563] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)97_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 97), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)97_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)647_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 647), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)647_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 283
Number of entity ID pairs that occurred more than once: 0
Analyse set of 283 weight vectors
Containing 195 true matches and 88 true non-matches
(68.90% true matches)
Identified 254 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 242 (95.28%)
2 : 9 (3.54%)
3 : 2 (0.79%)
17 : 1 (0.39%)
Identified 1 non-pure unique weight vectors (from 254 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 87
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 282
Number of unique weight vectors: 254
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(254, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 254 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 70
Perform initial selection using "far" method
Farthest first selection of 70 weight vectors from 254 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 38 matches and 32 non-matches
Purity of oracle classification: 0.543
Entropy of oracle classification: 0.995
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 32
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 184 weight vectors
Based on 38 matches and 32 non-matches
Classified 136 matches and 48 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 70
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.5428571428571428, 0.9946937953613058, 0.5428571428571428)
(48, 0.5428571428571428, 0.9946937953613058, 0.5428571428571428)
Current size of match and non-match training data sets: 38 / 32
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 0.99
- Size 136 weight vectors
- Estimated match proportion 0.543
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 136 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.971, 0.952, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 46 matches and 10 non-matches
Purity of oracle classification: 0.821
Entropy of oracle classification: 0.677
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 10
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)144_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 144), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)144_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 526
Number of entity ID pairs that occurred more than once: 0
Analyse set of 526 weight vectors
Containing 208 true matches and 318 true non-matches
(39.54% true matches)
Identified 497 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 480 (96.58%)
2 : 14 (2.82%)
3 : 2 (0.40%)
12 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 497 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 315
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 525
Number of unique weight vectors: 497
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(497, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 497 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 497 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 35 matches and 45 non-matches
Purity of oracle classification: 0.562
Entropy of oracle classification: 0.989
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 417 weight vectors
Based on 35 matches and 45 non-matches
Classified 142 matches and 275 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.5625, 0.9886994082884974, 0.4375)
(275, 0.5625, 0.9886994082884974, 0.4375)
Current size of match and non-match training data sets: 35 / 45
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 275 weight vectors
- Estimated match proportion 0.438
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 275 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 5 matches and 65 non-matches
Purity of oracle classification: 0.929
Entropy of oracle classification: 0.371
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)448_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 448), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)448_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 738
Number of entity ID pairs that occurred more than once: 0
Analyse set of 738 weight vectors
Containing 217 true matches and 521 true non-matches
(29.40% true matches)
Identified 703 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 688 (97.87%)
2 : 12 (1.71%)
3 : 2 (0.28%)
20 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 703 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 520
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 737
Number of unique weight vectors: 703
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(703, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 703 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 703 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 619 weight vectors
Based on 27 matches and 57 non-matches
Classified 129 matches and 490 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(129, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(490, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 129 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 129 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.420, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 48 matches and 3 non-matches
Purity of oracle classification: 0.941
Entropy of oracle classification: 0.323
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)122_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 122), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)122_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 891
Number of entity ID pairs that occurred more than once: 0
Analyse set of 891 weight vectors
Containing 199 true matches and 692 true non-matches
(22.33% true matches)
Identified 836 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 800 (95.69%)
2 : 33 (3.95%)
3 : 2 (0.24%)
19 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 836 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 164
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 671
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 890
Number of unique weight vectors: 836
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(836, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 836 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 836 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 750 weight vectors
Based on 28 matches and 58 non-matches
Classified 184 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(184, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 566 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 566 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 0 matches and 73 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)800_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 800), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)800_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 268
Number of entity ID pairs that occurred more than once: 0
Analyse set of 268 weight vectors
Containing 152 true matches and 116 true non-matches
(56.72% true matches)
Identified 253 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 246 (97.23%)
2 : 4 (1.58%)
3 : 2 (0.79%)
8 : 1 (0.40%)
Identified 1 non-pure unique weight vectors (from 253 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 137
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 115
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 260
Number of unique weight vectors: 252
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(252, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 252 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 70
Perform initial selection using "far" method
Farthest first selection of 70 weight vectors from 252 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 35 matches and 35 non-matches
Purity of oracle classification: 0.500
Entropy of oracle classification: 1.000
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 35
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 182 weight vectors
Based on 35 matches and 35 non-matches
Classified 107 matches and 75 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 70
Size, purity, entropy, and estimated match proportion of clusters in queue:
(107, 0.5, 1.0, 0.5)
(75, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 35 / 35
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 75 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 42
Farthest first selection of 42 weight vectors from 75 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.619, 1.000, 0.103, 0.163, 0.129, 0.146, 0.213] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 42 weight vectors
The oracle will correctly classify 42 weight vectors and wrongly classify 0
Classified 4 matches and 38 non-matches
Purity of oracle classification: 0.905
Entropy of oracle classification: 0.454
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 42 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)338_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 338), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)338_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 851
Number of entity ID pairs that occurred more than once: 0
Analyse set of 851 weight vectors
Containing 154 true matches and 697 true non-matches
(18.10% true matches)
Identified 815 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 787 (96.56%)
2 : 25 (3.07%)
3 : 2 (0.25%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 815 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 676
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 843
Number of unique weight vectors: 814
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(814, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 814 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 814 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 32 matches and 54 non-matches
Purity of oracle classification: 0.628
Entropy of oracle classification: 0.952
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 728 weight vectors
Based on 32 matches and 54 non-matches
Classified 156 matches and 572 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.627906976744186, 0.9522656254366642, 0.37209302325581395)
(572, 0.627906976744186, 0.9522656254366642, 0.37209302325581395)
Current size of match and non-match training data sets: 32 / 54
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 572 weight vectors
- Estimated match proportion 0.372
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 572 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)135_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 135), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)135_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 547
Number of entity ID pairs that occurred more than once: 0
Analyse set of 547 weight vectors
Containing 166 true matches and 381 true non-matches
(30.35% true matches)
Identified 529 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 520 (98.30%)
2 : 6 (1.13%)
3 : 2 (0.38%)
9 : 1 (0.19%)
Identified 1 non-pure unique weight vectors (from 529 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 380
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 538
Number of unique weight vectors: 528
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(528, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 528 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 528 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 28 matches and 53 non-matches
Purity of oracle classification: 0.654
Entropy of oracle classification: 0.930
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 447 weight vectors
Based on 28 matches and 53 non-matches
Classified 124 matches and 323 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(124, 0.654320987654321, 0.9301497323974337, 0.345679012345679)
(323, 0.654320987654321, 0.9301497323974337, 0.345679012345679)
Current size of match and non-match training data sets: 28 / 53
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 323 weight vectors
- Estimated match proportion 0.346
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 323 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.400, 0.733, 0.667, 0.647, 0.737] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.800, 0.684, 0.667, 0.529, 0.609] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 1 matches and 67 non-matches
Purity of oracle classification: 0.985
Entropy of oracle classification: 0.111
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(20)856_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 856), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)856_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1059
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1059 weight vectors
Containing 227 true matches and 832 true non-matches
(21.44% true matches)
Identified 1002 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 965 (96.31%)
2 : 34 (3.39%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1002 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 811
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1058
Number of unique weight vectors: 1002
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1002, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1002 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1002 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 915 weight vectors
Based on 30 matches and 57 non-matches
Classified 177 matches and 738 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(177, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(738, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 738 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 738 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.667, 0.538, 0.455, 0.581, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)175_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 175), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)175_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 617
Number of entity ID pairs that occurred more than once: 0
Analyse set of 617 weight vectors
Containing 156 true matches and 461 true non-matches
(25.28% true matches)
Identified 581 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 553 (95.18%)
2 : 25 (4.30%)
3 : 2 (0.34%)
8 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 581 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 140
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 440
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 609
Number of unique weight vectors: 580
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(580, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 580 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 580 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 28 matches and 54 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 498 weight vectors
Based on 28 matches and 54 non-matches
Classified 119 matches and 379 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(119, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
(379, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
Current size of match and non-match training data sets: 28 / 54
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 379 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 379 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.462, 0.609, 0.684, 0.308, 0.545] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 2 matches and 68 non-matches
Purity of oracle classification: 0.971
Entropy of oracle classification: 0.187
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 68
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)724_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 724), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)724_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 746
Number of entity ID pairs that occurred more than once: 0
Analyse set of 746 weight vectors
Containing 220 true matches and 526 true non-matches
(29.49% true matches)
Identified 692 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 655 (94.65%)
2 : 34 (4.91%)
3 : 2 (0.29%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 692 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 505
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 745
Number of unique weight vectors: 692
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(692, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 692 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 692 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 608 weight vectors
Based on 31 matches and 53 non-matches
Classified 300 matches and 308 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(300, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(308, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 300 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 300 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.261, 0.174, 0.148, 0.186, 0.148] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.600, 1.000, 0.217, 0.132, 0.167, 0.125, 0.188] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 42 matches and 27 non-matches
Purity of oracle classification: 0.609
Entropy of oracle classification: 0.966
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 27
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)935_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 935), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)935_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1069
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1069 weight vectors
Containing 221 true matches and 848 true non-matches
(20.67% true matches)
Identified 1013 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 977 (96.45%)
2 : 33 (3.26%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1013 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 827
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1068
Number of unique weight vectors: 1013
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1013, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1013 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1013 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 926 weight vectors
Based on 23 matches and 64 non-matches
Classified 106 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(106, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 106 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 106 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 44 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)424_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (15, 1 - acm diverg, 424), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)424_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 766
Number of entity ID pairs that occurred more than once: 0
Analyse set of 766 weight vectors
Containing 187 true matches and 579 true non-matches
(24.41% true matches)
Identified 742 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 729 (98.25%)
2 : 10 (1.35%)
3 : 2 (0.27%)
11 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 742 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 576
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 765
Number of unique weight vectors: 742
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(742, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 742 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 742 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 657 weight vectors
Based on 30 matches and 55 non-matches
Classified 127 matches and 530 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(127, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(530, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 127 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 127 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 48 matches and 4 non-matches
Purity of oracle classification: 0.923
Entropy of oracle classification: 0.391
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(10)599_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 599), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)599_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1011
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1011 weight vectors
Containing 196 true matches and 815 true non-matches
(19.39% true matches)
Identified 969 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 934 (96.39%)
2 : 32 (3.30%)
3 : 2 (0.21%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 969 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 795
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1011
Number of unique weight vectors: 969
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(969, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 969 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 969 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 25 matches and 62 non-matches
Purity of oracle classification: 0.713
Entropy of oracle classification: 0.865
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 882 weight vectors
Based on 25 matches and 62 non-matches
Classified 98 matches and 784 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(98, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
(784, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 25 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 784 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 784 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 16 matches and 55 non-matches
Purity of oracle classification: 0.775
Entropy of oracle classification: 0.770
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)111_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 111), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)111_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 605
Number of entity ID pairs that occurred more than once: 0
Analyse set of 605 weight vectors
Containing 154 true matches and 451 true non-matches
(25.45% true matches)
Identified 569 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 541 (95.08%)
2 : 25 (4.39%)
3 : 2 (0.35%)
8 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 569 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 430
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 597
Number of unique weight vectors: 568
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(568, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 568 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 568 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 25 matches and 57 non-matches
Purity of oracle classification: 0.695
Entropy of oracle classification: 0.887
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 486 weight vectors
Based on 25 matches and 57 non-matches
Classified 155 matches and 331 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6951219512195121, 0.8871723027673717, 0.3048780487804878)
(331, 0.6951219512195121, 0.8871723027673717, 0.3048780487804878)
Current size of match and non-match training data sets: 25 / 57
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.89
- Size 331 weight vectors
- Estimated match proportion 0.305
Sample size for this cluster: 65
Farthest first selection of 65 weight vectors from 331 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.269, 0.478, 0.750, 0.385, 0.455] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.538, 0.600, 0.471, 0.632, 0.688] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.500, 0.571, 0.467, 0.467, 0.389] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.179, 0.500, 0.412, 0.357] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.800, 0.667, 0.381, 0.550, 0.429] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.571, 0.286, 0.333, 0.571, 0.600] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 0 matches and 65 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(20)822_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 822), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)822_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 969
Number of entity ID pairs that occurred more than once: 0
Analyse set of 969 weight vectors
Containing 219 true matches and 750 true non-matches
(22.60% true matches)
Identified 914 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 878 (96.06%)
2 : 33 (3.61%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 914 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 729
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 968
Number of unique weight vectors: 914
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(914, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 914 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 914 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 827 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 704 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(704, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 123 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)846_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 846), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)846_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1058
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1058 weight vectors
Containing 209 true matches and 849 true non-matches
(19.75% true matches)
Identified 1011 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 976 (96.54%)
2 : 32 (3.17%)
3 : 2 (0.20%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1011 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1057
Number of unique weight vectors: 1011
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1011, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1011 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1011 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 924 weight vectors
Based on 23 matches and 64 non-matches
Classified 104 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(104, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)729_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 729), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)729_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 707
Number of entity ID pairs that occurred more than once: 0
Analyse set of 707 weight vectors
Containing 208 true matches and 499 true non-matches
(29.42% true matches)
Identified 673 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 656 (97.47%)
2 : 14 (2.08%)
3 : 2 (0.30%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 673 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 496
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 706
Number of unique weight vectors: 673
Time to load and analyse the weight vector file: 0.04 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(673, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 673 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 673 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 35 matches and 49 non-matches
Purity of oracle classification: 0.583
Entropy of oracle classification: 0.980
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 589 weight vectors
Based on 35 matches and 49 non-matches
Classified 279 matches and 310 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(279, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
(310, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
Current size of match and non-match training data sets: 35 / 49
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 279 weight vectors
- Estimated match proportion 0.417
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 279 vectors
The selected farthest weight vectors are:
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 46 matches and 24 non-matches
Purity of oracle classification: 0.657
Entropy of oracle classification: 0.928
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 24
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)352_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 352), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)352_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 112 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 44 matches and 1 non-matches
Purity of oracle classification: 0.978
Entropy of oracle classification: 0.154
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)561_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 561), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)561_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 377
Number of entity ID pairs that occurred more than once: 0
Analyse set of 377 weight vectors
Containing 195 true matches and 182 true non-matches
(51.72% true matches)
Identified 350 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 334 (95.43%)
2 : 13 (3.71%)
3 : 2 (0.57%)
11 : 1 (0.29%)
Identified 1 non-pure unique weight vectors (from 350 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 179
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 376
Number of unique weight vectors: 350
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(350, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 350 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 350 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 31 matches and 44 non-matches
Purity of oracle classification: 0.587
Entropy of oracle classification: 0.978
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 275 weight vectors
Based on 31 matches and 44 non-matches
Classified 143 matches and 132 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 75
Size, purity, entropy, and estimated match proportion of clusters in queue:
(143, 0.5866666666666667, 0.9782176659354248, 0.41333333333333333)
(132, 0.5866666666666667, 0.9782176659354248, 0.41333333333333333)
Current size of match and non-match training data sets: 31 / 44
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 143 weight vectors
- Estimated match proportion 0.413
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 143 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 51 matches and 6 non-matches
Purity of oracle classification: 0.895
Entropy of oracle classification: 0.485
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)896_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 896), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)896_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1041
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1041 weight vectors
Containing 213 true matches and 828 true non-matches
(20.46% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 954 (96.46%)
2 : 32 (3.24%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 807
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1040
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 26 matches and 61 non-matches
Classified 44 matches and 858 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(44, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(858, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 858 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 858 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.000, 0.556, 0.182, 0.500, 0.071, 0.400] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.233, 0.545, 0.714, 0.455, 0.238] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 17 matches and 56 non-matches
Purity of oracle classification: 0.767
Entropy of oracle classification: 0.783
Number of true matches: 17
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)304_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 304), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)304_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 755
Number of entity ID pairs that occurred more than once: 0
Analyse set of 755 weight vectors
Containing 203 true matches and 552 true non-matches
(26.89% true matches)
Identified 726 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 709 (97.66%)
2 : 14 (1.93%)
3 : 2 (0.28%)
12 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 726 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 549
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 754
Number of unique weight vectors: 726
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(726, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 726 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 726 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 36 matches and 49 non-matches
Purity of oracle classification: 0.576
Entropy of oracle classification: 0.983
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 641 weight vectors
Based on 36 matches and 49 non-matches
Classified 307 matches and 334 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(307, 0.5764705882352941, 0.9830605548016025, 0.4235294117647059)
(334, 0.5764705882352941, 0.9830605548016025, 0.4235294117647059)
Current size of match and non-match training data sets: 36 / 49
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 334 weight vectors
- Estimated match proportion 0.424
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 334 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.750, 0.905, 0.667, 0.500, 0.571] (False)
[1.000, 0.000, 0.579, 0.583, 0.522, 0.417, 0.563] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.429, 0.786, 0.750, 0.389, 0.857] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.769, 0.679, 0.412, 0.591, 0.500] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.556, 0.429, 0.500, 0.700, 0.643] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.500, 0.600, 0.353, 0.611, 0.526] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 0 matches and 73 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)100_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 100), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)100_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 395
Number of entity ID pairs that occurred more than once: 0
Analyse set of 395 weight vectors
Containing 213 true matches and 182 true non-matches
(53.92% true matches)
Identified 358 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 340 (94.97%)
2 : 15 (4.19%)
3 : 2 (0.56%)
19 : 1 (0.28%)
Identified 1 non-pure unique weight vectors (from 358 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 179
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 394
Number of unique weight vectors: 358
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(358, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 358 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 358 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 31 matches and 45 non-matches
Purity of oracle classification: 0.592
Entropy of oracle classification: 0.975
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 282 weight vectors
Based on 31 matches and 45 non-matches
Classified 151 matches and 131 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.5921052631578947, 0.9753817903274212, 0.40789473684210525)
(131, 0.5921052631578947, 0.9753817903274212, 0.40789473684210525)
Current size of match and non-match training data sets: 31 / 45
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 131 weight vectors
- Estimated match proportion 0.408
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 131 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 3 matches and 51 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.310
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)467_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 467), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)467_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(15)56_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (15, 1 - acm diverg, 56), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)56_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 771
Number of entity ID pairs that occurred more than once: 0
Analyse set of 771 weight vectors
Containing 203 true matches and 568 true non-matches
(26.33% true matches)
Identified 721 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 687 (95.28%)
2 : 31 (4.30%)
3 : 2 (0.28%)
16 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 721 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 547
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 770
Number of unique weight vectors: 721
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(721, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 721 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 721 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 30 matches and 54 non-matches
Purity of oracle classification: 0.643
Entropy of oracle classification: 0.940
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 637 weight vectors
Based on 30 matches and 54 non-matches
Classified 140 matches and 497 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
(497, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
Current size of match and non-match training data sets: 30 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 140 weight vectors
- Estimated match proportion 0.357
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 140 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 47 matches and 7 non-matches
Purity of oracle classification: 0.870
Entropy of oracle classification: 0.556
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(15)591_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979592
recall 0.32107
f-measure 0.483627
da 98
dm 0
ndm 0
tp 96
fp 2
tn 4.76529e+07
fn 203
Name: (15, 1 - acm diverg, 591), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)591_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 678
Number of entity ID pairs that occurred more than once: 0
Analyse set of 678 weight vectors
Containing 167 true matches and 511 true non-matches
(24.63% true matches)
Identified 659 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 646 (98.03%)
2 : 10 (1.52%)
3 : 2 (0.30%)
6 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 659 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.000 : 509
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 678
Number of unique weight vectors: 659
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(659, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 659 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 659 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 32 matches and 52 non-matches
Purity of oracle classification: 0.619
Entropy of oracle classification: 0.959
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 575 weight vectors
Based on 32 matches and 52 non-matches
Classified 114 matches and 461 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
(461, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
Current size of match and non-match training data sets: 32 / 52
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 461 weight vectors
- Estimated match proportion 0.381
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 461 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 3 matches and 73 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.240
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
98.0
Analisando o arquivo: diverg(10)517_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976
recall 0.408027
f-measure 0.575472
da 125
dm 0
ndm 0
tp 122
fp 3
tn 4.76529e+07
fn 177
Name: (10, 1 - acm diverg, 517), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)517_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 710
Number of entity ID pairs that occurred more than once: 0
Analyse set of 710 weight vectors
Containing 141 true matches and 569 true non-matches
(19.86% true matches)
Identified 676 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 647 (95.71%)
2 : 26 (3.85%)
3 : 2 (0.30%)
5 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 676 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 127
0.000 : 549
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 710
Number of unique weight vectors: 676
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(676, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 676 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 676 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 25 matches and 59 non-matches
Purity of oracle classification: 0.702
Entropy of oracle classification: 0.878
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 592 weight vectors
Based on 25 matches and 59 non-matches
Classified 87 matches and 505 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(87, 0.7023809523809523, 0.8783609387702276, 0.2976190476190476)
(505, 0.7023809523809523, 0.8783609387702276, 0.2976190476190476)
Current size of match and non-match training data sets: 25 / 59
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 87 weight vectors
- Estimated match proportion 0.298
Sample size for this cluster: 42
Farthest first selection of 42 weight vectors from 87 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 42 weight vectors
The oracle will correctly classify 42 weight vectors and wrongly classify 0
Classified 41 matches and 1 non-matches
Purity of oracle classification: 0.976
Entropy of oracle classification: 0.162
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 42 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
125.0
Analisando o arquivo: diverg(15)642_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 642), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)642_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 855
Number of entity ID pairs that occurred more than once: 0
Analyse set of 855 weight vectors
Containing 221 true matches and 634 true non-matches
(25.85% true matches)
Identified 799 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 763 (95.49%)
2 : 33 (4.13%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 799 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 854
Number of unique weight vectors: 799
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(799, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 799 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 799 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 714 weight vectors
Based on 28 matches and 57 non-matches
Classified 150 matches and 564 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(564, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 564 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 564 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 5 matches and 69 non-matches
Purity of oracle classification: 0.932
Entropy of oracle classification: 0.357
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)680_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 680), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)680_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 112 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 44 matches and 1 non-matches
Purity of oracle classification: 0.978
Entropy of oracle classification: 0.154
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)622_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 622), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)622_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 455
Number of entity ID pairs that occurred more than once: 0
Analyse set of 455 weight vectors
Containing 219 true matches and 236 true non-matches
(48.13% true matches)
Identified 419 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 400 (95.47%)
2 : 16 (3.82%)
3 : 2 (0.48%)
17 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 419 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 233
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 454
Number of unique weight vectors: 419
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(419, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 419 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 419 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 40 matches and 38 non-matches
Purity of oracle classification: 0.513
Entropy of oracle classification: 1.000
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 341 weight vectors
Based on 40 matches and 38 non-matches
Classified 282 matches and 59 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(282, 0.5128205128205128, 0.9995256892936493, 0.5128205128205128)
(59, 0.5128205128205128, 0.9995256892936493, 0.5128205128205128)
Current size of match and non-match training data sets: 40 / 38
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 282 weight vectors
- Estimated match proportion 0.513
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 282 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 44 matches and 28 non-matches
Purity of oracle classification: 0.611
Entropy of oracle classification: 0.964
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)575_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 575), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)575_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 883
Number of entity ID pairs that occurred more than once: 0
Analyse set of 883 weight vectors
Containing 212 true matches and 671 true non-matches
(24.01% true matches)
Identified 831 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 795 (95.67%)
2 : 33 (3.97%)
3 : 2 (0.24%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 831 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 650
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 882
Number of unique weight vectors: 831
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(831, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 831 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 831 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 745 weight vectors
Based on 29 matches and 57 non-matches
Classified 163 matches and 582 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(163, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(582, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 582 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 582 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 0 matches and 75 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 75
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)574_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984615
recall 0.214047
f-measure 0.351648
da 65
dm 0
ndm 0
tp 64
fp 1
tn 4.76529e+07
fn 235
Name: (10, 1 - acm diverg, 574), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)574_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 860
Number of entity ID pairs that occurred more than once: 0
Analyse set of 860 weight vectors
Containing 191 true matches and 669 true non-matches
(22.21% true matches)
Identified 813 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 780 (95.94%)
2 : 30 (3.69%)
3 : 2 (0.25%)
14 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 813 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 164
0.929 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 648
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 859
Number of unique weight vectors: 813
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(813, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 813 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 813 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 727 weight vectors
Based on 29 matches and 57 non-matches
Classified 146 matches and 581 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(581, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 146 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 146 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 45 matches and 9 non-matches
Purity of oracle classification: 0.833
Entropy of oracle classification: 0.650
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
65.0
Analisando o arquivo: diverg(15)724_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 724), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)724_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 812
Number of entity ID pairs that occurred more than once: 0
Analyse set of 812 weight vectors
Containing 226 true matches and 586 true non-matches
(27.83% true matches)
Identified 755 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 718 (95.10%)
2 : 34 (4.50%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 755 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 565
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 811
Number of unique weight vectors: 755
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(755, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 755 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 755 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 670 weight vectors
Based on 29 matches and 56 non-matches
Classified 165 matches and 505 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(165, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(505, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 165 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 165 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 47 matches and 10 non-matches
Purity of oracle classification: 0.825
Entropy of oracle classification: 0.670
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 10
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)790_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 790), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)790_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 696
Number of entity ID pairs that occurred more than once: 0
Analyse set of 696 weight vectors
Containing 208 true matches and 488 true non-matches
(29.89% true matches)
Identified 660 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 644 (97.58%)
2 : 13 (1.97%)
3 : 2 (0.30%)
20 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 660 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 487
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 695
Number of unique weight vectors: 660
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(660, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 660 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 660 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.667, 0.571, 0.500, 0.625] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 34 matches and 50 non-matches
Purity of oracle classification: 0.595
Entropy of oracle classification: 0.974
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 576 weight vectors
Based on 34 matches and 50 non-matches
Classified 304 matches and 272 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(304, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
(272, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
Current size of match and non-match training data sets: 34 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 272 weight vectors
- Estimated match proportion 0.405
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 272 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.667, 0.857, 0.353, 0.632, 0.550] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 0.000, 0.864, 0.667, 0.435, 0.700, 0.600] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.462, 0.609, 0.643, 0.706, 0.786] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.950, 0.000, 0.619, 0.800, 0.478, 0.280, 0.625] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 0 matches and 69 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)948_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 948), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)948_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 984
Number of entity ID pairs that occurred more than once: 0
Analyse set of 984 weight vectors
Containing 211 true matches and 773 true non-matches
(21.44% true matches)
Identified 932 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 897 (96.24%)
2 : 32 (3.43%)
3 : 2 (0.21%)
17 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 932 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 752
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 983
Number of unique weight vectors: 932
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(932, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 932 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 932 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 845 weight vectors
Based on 31 matches and 56 non-matches
Classified 292 matches and 553 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(292, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(553, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 292 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 292 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.600, 1.000, 0.217, 0.132, 0.167, 0.125, 0.188] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 42 matches and 26 non-matches
Purity of oracle classification: 0.618
Entropy of oracle classification: 0.960
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)742_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 742), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)742_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)874_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.983871
recall 0.204013
f-measure 0.33795
da 62
dm 0
ndm 0
tp 61
fp 1
tn 4.76529e+07
fn 238
Name: (10, 1 - acm diverg, 874), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)874_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 661
Number of entity ID pairs that occurred more than once: 0
Analyse set of 661 weight vectors
Containing 197 true matches and 464 true non-matches
(29.80% true matches)
Identified 611 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 577 (94.44%)
2 : 31 (5.07%)
3 : 2 (0.33%)
16 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 611 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 443
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 660
Number of unique weight vectors: 611
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(611, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 611 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 611 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 29 matches and 54 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.934
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 528 weight vectors
Based on 29 matches and 54 non-matches
Classified 148 matches and 380 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
(380, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
Current size of match and non-match training data sets: 29 / 54
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 380 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 380 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.462, 0.609, 0.684, 0.308, 0.545] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 2 matches and 69 non-matches
Purity of oracle classification: 0.972
Entropy of oracle classification: 0.185
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
62.0
Analisando o arquivo: diverg(20)86_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 86), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)86_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)797_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 797), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)797_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 586
Number of entity ID pairs that occurred more than once: 0
Analyse set of 586 weight vectors
Containing 196 true matches and 390 true non-matches
(33.45% true matches)
Identified 562 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 549 (97.69%)
2 : 10 (1.78%)
3 : 2 (0.36%)
11 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 562 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 389
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 585
Number of unique weight vectors: 562
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(562, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 562 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 562 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 27 matches and 55 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 480 weight vectors
Based on 27 matches and 55 non-matches
Classified 136 matches and 344 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
(344, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
Current size of match and non-match training data sets: 27 / 55
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 136 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 136 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)600_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 600), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)600_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 537
Number of entity ID pairs that occurred more than once: 0
Analyse set of 537 weight vectors
Containing 224 true matches and 313 true non-matches
(41.71% true matches)
Identified 498 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 479 (96.18%)
2 : 16 (3.21%)
3 : 2 (0.40%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 498 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 310
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 536
Number of unique weight vectors: 498
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(498, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 498 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 498 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 33 matches and 47 non-matches
Purity of oracle classification: 0.588
Entropy of oracle classification: 0.978
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 47
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 418 weight vectors
Based on 33 matches and 47 non-matches
Classified 151 matches and 267 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.5875, 0.9777945702913884, 0.4125)
(267, 0.5875, 0.9777945702913884, 0.4125)
Current size of match and non-match training data sets: 33 / 47
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 151 weight vectors
- Estimated match proportion 0.412
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.909, 1.000, 1.000, 1.000, 0.947] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 51 matches and 7 non-matches
Purity of oracle classification: 0.879
Entropy of oracle classification: 0.531
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)49_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 49), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)49_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 768
Number of entity ID pairs that occurred more than once: 0
Analyse set of 768 weight vectors
Containing 216 true matches and 552 true non-matches
(28.12% true matches)
Identified 730 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 712 (97.53%)
2 : 15 (2.05%)
3 : 2 (0.27%)
20 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 730 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 549
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 767
Number of unique weight vectors: 730
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(730, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 730 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 730 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 36 matches and 49 non-matches
Purity of oracle classification: 0.576
Entropy of oracle classification: 0.983
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 645 weight vectors
Based on 36 matches and 49 non-matches
Classified 276 matches and 369 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(276, 0.5764705882352941, 0.9830605548016025, 0.4235294117647059)
(369, 0.5764705882352941, 0.9830605548016025, 0.4235294117647059)
Current size of match and non-match training data sets: 36 / 49
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 276 weight vectors
- Estimated match proportion 0.424
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 276 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.146, 0.130, 0.176, 0.318, 0.167] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.261, 0.174, 0.148, 0.186, 0.148] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.736, 1.000, 0.250, 0.290, 0.172, 0.188, 0.286] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.242, 0.121, 0.200, 0.171, 0.000] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 44 matches and 26 non-matches
Purity of oracle classification: 0.629
Entropy of oracle classification: 0.952
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)440_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 440), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)440_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 626
Number of entity ID pairs that occurred more than once: 0
Analyse set of 626 weight vectors
Containing 188 true matches and 438 true non-matches
(30.03% true matches)
Identified 605 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 591 (97.69%)
2 : 11 (1.82%)
3 : 2 (0.33%)
7 : 1 (0.17%)
Identified 0 non-pure unique weight vectors (from 605 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.000 : 438
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 626
Number of unique weight vectors: 605
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(605, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 605 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 605 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 32 matches and 51 non-matches
Purity of oracle classification: 0.614
Entropy of oracle classification: 0.962
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 522 weight vectors
Based on 32 matches and 51 non-matches
Classified 273 matches and 249 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(273, 0.6144578313253012, 0.9618624139909456, 0.3855421686746988)
(249, 0.6144578313253012, 0.9618624139909456, 0.3855421686746988)
Current size of match and non-match training data sets: 32 / 51
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 249 weight vectors
- Estimated match proportion 0.386
Sample size for this cluster: 67
Farthest first selection of 67 weight vectors from 249 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.846, 0.684, 0.529, 0.727, 0.700] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.579, 0.583, 0.522, 0.417, 0.563] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.923, 0.667, 0.667, 0.412, 0.571] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.714, 0.500, 0.412, 0.762] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.750, 0.905, 0.667, 0.500, 0.571] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.296, 0.600, 0.471, 0.600, 0.643] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.533, 0.667, 0.333, 0.714, 0.632] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.433, 0.737, 0.706, 0.500, 0.800] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 0 matches and 67 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(10)200_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 200), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)200_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 659
Number of entity ID pairs that occurred more than once: 0
Analyse set of 659 weight vectors
Containing 211 true matches and 448 true non-matches
(32.02% true matches)
Identified 607 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 571 (94.07%)
2 : 33 (5.44%)
3 : 2 (0.33%)
16 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 607 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 427
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 658
Number of unique weight vectors: 607
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(607, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 607 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 607 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 524 weight vectors
Based on 28 matches and 55 non-matches
Classified 179 matches and 345 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(179, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(345, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 179 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 179 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.947, 1.000, 0.292, 0.178, 0.227, 0.122, 0.154] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 44 matches and 14 non-matches
Purity of oracle classification: 0.759
Entropy of oracle classification: 0.797
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 14
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)802_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 802), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)802_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 112 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 44 matches and 1 non-matches
Purity of oracle classification: 0.978
Entropy of oracle classification: 0.154
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)495_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 495), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)495_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 801
Number of entity ID pairs that occurred more than once: 0
Analyse set of 801 weight vectors
Containing 220 true matches and 581 true non-matches
(27.47% true matches)
Identified 763 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 745 (97.64%)
2 : 15 (1.97%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 763 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 578
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 800
Number of unique weight vectors: 763
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(763, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 763 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 763 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 678 weight vectors
Based on 28 matches and 57 non-matches
Classified 135 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 135 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 135 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)881_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990099
recall 0.334448
f-measure 0.5
da 101
dm 0
ndm 0
tp 100
fp 1
tn 4.76529e+07
fn 199
Name: (10, 1 - acm diverg, 881), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)881_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 983
Number of entity ID pairs that occurred more than once: 0
Analyse set of 983 weight vectors
Containing 164 true matches and 819 true non-matches
(16.68% true matches)
Identified 944 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 915 (96.93%)
2 : 26 (2.75%)
3 : 2 (0.21%)
10 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 944 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 145
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 798
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 982
Number of unique weight vectors: 944
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(944, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 944 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 944 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 857 weight vectors
Based on 32 matches and 55 non-matches
Classified 286 matches and 571 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(286, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(571, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 571 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 571 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.538, 0.789, 0.353, 0.545, 0.550] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.444, 0.643, 0.421, 0.200, 0.556] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.350, 0.455, 0.625, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.667, 0.444, 0.556, 0.222, 0.143] (False)
[1.000, 0.000, 0.583, 0.389, 0.471, 0.545, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.750, 0.533, 0.294, 0.333, 0.429] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
101.0
Analisando o arquivo: diverg(10)369_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 369), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)369_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 797
Number of entity ID pairs that occurred more than once: 0
Analyse set of 797 weight vectors
Containing 207 true matches and 590 true non-matches
(25.97% true matches)
Identified 750 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 715 (95.33%)
2 : 32 (4.27%)
3 : 2 (0.27%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 750 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 569
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 796
Number of unique weight vectors: 750
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(750, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 750 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 750 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 24 matches and 61 non-matches
Purity of oracle classification: 0.718
Entropy of oracle classification: 0.859
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 665 weight vectors
Based on 24 matches and 61 non-matches
Classified 103 matches and 562 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7176470588235294, 0.8586370819183629, 0.2823529411764706)
(562, 0.7176470588235294, 0.8586370819183629, 0.2823529411764706)
Current size of match and non-match training data sets: 24 / 61
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 103 weight vectors
- Estimated match proportion 0.282
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 103 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 44 matches and 1 non-matches
Purity of oracle classification: 0.978
Entropy of oracle classification: 0.154
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)984_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 984), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)984_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 29 matches and 56 non-matches
Classified 146 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(538, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 538 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 538 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.407, 0.643, 0.667, 0.500, 0.563] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 9 matches and 65 non-matches
Purity of oracle classification: 0.878
Entropy of oracle classification: 0.534
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)694_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.980583
recall 0.337793
f-measure 0.502488
da 103
dm 0
ndm 0
tp 101
fp 2
tn 4.76529e+07
fn 198
Name: (10, 1 - acm diverg, 694), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)694_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 453
Number of entity ID pairs that occurred more than once: 0
Analyse set of 453 weight vectors
Containing 146 true matches and 307 true non-matches
(32.23% true matches)
Identified 441 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 433 (98.19%)
2 : 5 (1.13%)
3 : 2 (0.45%)
4 : 1 (0.23%)
Identified 0 non-pure unique weight vectors (from 441 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 134
0.000 : 307
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 453
Number of unique weight vectors: 441
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(441, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 441 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 441 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 28 matches and 51 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.938
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 362 weight vectors
Based on 28 matches and 51 non-matches
Classified 108 matches and 254 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(108, 0.6455696202531646, 0.9379626436434423, 0.35443037974683544)
(254, 0.6455696202531646, 0.9379626436434423, 0.35443037974683544)
Current size of match and non-match training data sets: 28 / 51
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 108 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 108 vectors
The selected farthest weight vectors are:
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 43 matches and 6 non-matches
Purity of oracle classification: 0.878
Entropy of oracle classification: 0.536
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(20)157_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 157), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)157_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 667
Number of entity ID pairs that occurred more than once: 0
Analyse set of 667 weight vectors
Containing 217 true matches and 450 true non-matches
(32.53% true matches)
Identified 630 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 612 (97.14%)
2 : 15 (2.38%)
3 : 2 (0.32%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 630 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 447
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 666
Number of unique weight vectors: 630
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(630, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 630 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 630 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 27 matches and 56 non-matches
Purity of oracle classification: 0.675
Entropy of oracle classification: 0.910
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 547 weight vectors
Based on 27 matches and 56 non-matches
Classified 133 matches and 414 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
(414, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
Current size of match and non-match training data sets: 27 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 414 weight vectors
- Estimated match proportion 0.325
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 414 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 12 matches and 58 non-matches
Purity of oracle classification: 0.829
Entropy of oracle classification: 0.661
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)493_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 493), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)493_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 845
Number of entity ID pairs that occurred more than once: 0
Analyse set of 845 weight vectors
Containing 227 true matches and 618 true non-matches
(26.86% true matches)
Identified 788 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 751 (95.30%)
2 : 34 (4.31%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 788 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 597
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 844
Number of unique weight vectors: 788
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(788, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 788 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 788 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 703 weight vectors
Based on 29 matches and 56 non-matches
Classified 162 matches and 541 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(541, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 162 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 162 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)632_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 632), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)632_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)896_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 896), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)896_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 831
Number of entity ID pairs that occurred more than once: 0
Analyse set of 831 weight vectors
Containing 227 true matches and 604 true non-matches
(27.32% true matches)
Identified 774 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 737 (95.22%)
2 : 34 (4.39%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 774 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 583
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 830
Number of unique weight vectors: 774
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(774, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 774 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 774 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 689 weight vectors
Based on 28 matches and 57 non-matches
Classified 151 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(538, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 151 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 51 matches and 3 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.310
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)875_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 875), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)875_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)931_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976
recall 0.408027
f-measure 0.575472
da 125
dm 0
ndm 0
tp 122
fp 3
tn 4.76529e+07
fn 177
Name: (15, 1 - acm diverg, 931), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)931_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 969
Number of entity ID pairs that occurred more than once: 0
Analyse set of 969 weight vectors
Containing 143 true matches and 826 true non-matches
(14.76% true matches)
Identified 935 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 906 (96.90%)
2 : 26 (2.78%)
3 : 2 (0.21%)
5 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 935 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 129
0.000 : 806
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 969
Number of unique weight vectors: 935
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(935, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 935 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 935 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 848 weight vectors
Based on 27 matches and 60 non-matches
Classified 91 matches and 757 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(91, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(757, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 91 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 91 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 40 matches and 3 non-matches
Purity of oracle classification: 0.930
Entropy of oracle classification: 0.365
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
125.0
Analisando o arquivo: diverg(15)52_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 52), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)52_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 544
Number of entity ID pairs that occurred more than once: 0
Analyse set of 544 weight vectors
Containing 209 true matches and 335 true non-matches
(38.42% true matches)
Identified 513 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 498 (97.08%)
2 : 12 (2.34%)
3 : 2 (0.39%)
16 : 1 (0.19%)
Identified 1 non-pure unique weight vectors (from 513 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 334
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 543
Number of unique weight vectors: 513
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(513, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 513 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 513 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 30 matches and 51 non-matches
Purity of oracle classification: 0.630
Entropy of oracle classification: 0.951
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 432 weight vectors
Based on 30 matches and 51 non-matches
Classified 151 matches and 281 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6296296296296297, 0.9509560484549725, 0.37037037037037035)
(281, 0.6296296296296297, 0.9509560484549725, 0.37037037037037035)
Current size of match and non-match training data sets: 30 / 51
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 151 weight vectors
- Estimated match proportion 0.370
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.933, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 49 matches and 7 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)340_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976923
recall 0.424749
f-measure 0.592075
da 130
dm 0
ndm 0
tp 127
fp 3
tn 4.76529e+07
fn 172
Name: (10, 1 - acm diverg, 340), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)340_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 934
Number of entity ID pairs that occurred more than once: 0
Analyse set of 934 weight vectors
Containing 137 true matches and 797 true non-matches
(14.67% true matches)
Identified 900 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 871 (96.78%)
2 : 26 (2.89%)
3 : 2 (0.22%)
5 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 900 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 123
0.000 : 777
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 934
Number of unique weight vectors: 900
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(900, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 900 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 900 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 814 weight vectors
Based on 30 matches and 56 non-matches
Classified 236 matches and 578 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(236, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(578, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 236 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 236 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[0.512, 1.000, 0.087, 0.190, 0.107, 0.226, 0.204] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.971, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.817, 1.000, 0.194, 0.091, 0.163, 0.222, 0.200] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.600, 0.944, 0.250, 0.200, 0.186, 0.136, 0.118] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 39 matches and 25 non-matches
Purity of oracle classification: 0.609
Entropy of oracle classification: 0.965
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
130.0
Analisando o arquivo: diverg(15)926_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 926), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)926_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 802
Number of entity ID pairs that occurred more than once: 0
Analyse set of 802 weight vectors
Containing 226 true matches and 576 true non-matches
(28.18% true matches)
Identified 745 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 708 (95.03%)
2 : 34 (4.56%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 745 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 555
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 801
Number of unique weight vectors: 745
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(745, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 745 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 745 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 32 matches and 53 non-matches
Purity of oracle classification: 0.624
Entropy of oracle classification: 0.956
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 660 weight vectors
Based on 32 matches and 53 non-matches
Classified 331 matches and 329 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(331, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
(329, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
Current size of match and non-match training data sets: 32 / 53
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 329 weight vectors
- Estimated match proportion 0.376
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 329 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.556, 0.348, 0.467, 0.636, 0.412] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.917, 0.786, 0.263, 0.500, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.269, 0.478, 0.750, 0.385, 0.455] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.571, 0.857, 0.583, 0.667, 0.889] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.583, 0.786, 0.842, 0.800, 0.833] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)373_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 373), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)373_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 893
Number of entity ID pairs that occurred more than once: 0
Analyse set of 893 weight vectors
Containing 198 true matches and 695 true non-matches
(22.17% true matches)
Identified 848 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 814 (95.99%)
2 : 31 (3.66%)
3 : 2 (0.24%)
11 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 848 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 674
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 892
Number of unique weight vectors: 848
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(848, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 848 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 848 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 27 matches and 59 non-matches
Purity of oracle classification: 0.686
Entropy of oracle classification: 0.898
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 762 weight vectors
Based on 27 matches and 59 non-matches
Classified 194 matches and 568 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(194, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
(568, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
Current size of match and non-match training data sets: 27 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 194 weight vectors
- Estimated match proportion 0.314
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 194 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.530, 1.000, 0.159, 0.086, 0.182, 0.159, 0.163] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 41 matches and 17 non-matches
Purity of oracle classification: 0.707
Entropy of oracle classification: 0.873
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 17
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)51_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 51), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)51_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 29 matches and 59 non-matches
Purity of oracle classification: 0.670
Entropy of oracle classification: 0.914
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 29 matches and 59 non-matches
Classified 162 matches and 777 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
(777, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
Current size of match and non-match training data sets: 29 / 59
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 162 weight vectors
- Estimated match proportion 0.330
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 162 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)393_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 393), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)393_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 795
Number of entity ID pairs that occurred more than once: 0
Analyse set of 795 weight vectors
Containing 209 true matches and 586 true non-matches
(26.29% true matches)
Identified 748 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 713 (95.32%)
2 : 32 (4.28%)
3 : 2 (0.27%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 748 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 565
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 794
Number of unique weight vectors: 748
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(748, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 748 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 748 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 663 weight vectors
Based on 29 matches and 56 non-matches
Classified 155 matches and 508 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(508, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 508 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 508 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.462, 0.609, 0.684, 0.308, 0.545] (False)
[0.817, 1.000, 0.250, 0.212, 0.256, 0.045, 0.250] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 2 matches and 72 non-matches
Purity of oracle classification: 0.973
Entropy of oracle classification: 0.179
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)906_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 906), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)906_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 801
Number of entity ID pairs that occurred more than once: 0
Analyse set of 801 weight vectors
Containing 222 true matches and 579 true non-matches
(27.72% true matches)
Identified 747 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 710 (95.05%)
2 : 34 (4.55%)
3 : 2 (0.27%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 747 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 558
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 800
Number of unique weight vectors: 747
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(747, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 747 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 747 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 662 weight vectors
Based on 27 matches and 58 non-matches
Classified 148 matches and 514 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(514, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 514 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 514 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 8 matches and 64 non-matches
Purity of oracle classification: 0.889
Entropy of oracle classification: 0.503
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)109_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 109), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)109_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 392
Number of entity ID pairs that occurred more than once: 0
Analyse set of 392 weight vectors
Containing 218 true matches and 174 true non-matches
(55.61% true matches)
Identified 359 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 343 (95.54%)
2 : 13 (3.62%)
3 : 2 (0.56%)
17 : 1 (0.28%)
Identified 1 non-pure unique weight vectors (from 359 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 173
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 391
Number of unique weight vectors: 359
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(359, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 359 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 359 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 43 matches and 33 non-matches
Purity of oracle classification: 0.566
Entropy of oracle classification: 0.987
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 33
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 283 weight vectors
Based on 43 matches and 33 non-matches
Classified 283 matches and 0 non-matches
42.0
Analisando o arquivo: diverg(10)731_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 731), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)731_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 623
Number of entity ID pairs that occurred more than once: 0
Analyse set of 623 weight vectors
Containing 194 true matches and 429 true non-matches
(31.14% true matches)
Identified 574 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 540 (94.08%)
2 : 31 (5.40%)
3 : 2 (0.35%)
15 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 574 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 408
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 622
Number of unique weight vectors: 574
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(574, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 574 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 574 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 30 matches and 52 non-matches
Purity of oracle classification: 0.634
Entropy of oracle classification: 0.947
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 492 weight vectors
Based on 30 matches and 52 non-matches
Classified 155 matches and 337 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
(337, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
Current size of match and non-match training data sets: 30 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 337 weight vectors
- Estimated match proportion 0.366
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 337 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.538, 0.500, 0.818, 0.789, 0.750] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[1.000, 0.000, 0.750, 0.778, 0.471, 0.727, 0.684] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.833, 0.571, 0.727, 0.647, 0.857] (False)
[1.000, 0.000, 0.857, 0.286, 0.500, 0.643, 0.600] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.917, 0.786, 0.263, 0.500, 0.556] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[0.800, 0.000, 0.625, 0.571, 0.467, 0.474, 0.667] (False)
[1.000, 0.000, 0.423, 0.478, 0.500, 0.813, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.333, 0.214, 0.333, 0.588, 0.476] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.583, 0.389, 0.471, 0.545, 0.474] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.385, 0.391, 0.667, 0.579, 0.824] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.000, 0.700, 0.818, 0.444, 0.619] (False)
[1.000, 0.000, 0.857, 0.444, 0.556, 0.235, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.750, 0.533, 0.294, 0.333, 0.429] (False)
[1.000, 0.000, 0.333, 0.750, 0.667, 0.667, 0.571] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.067, 0.550, 0.818, 0.727, 0.762] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 0 matches and 70 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(10)651_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 651), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)651_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 487
Number of entity ID pairs that occurred more than once: 0
Analyse set of 487 weight vectors
Containing 222 true matches and 265 true non-matches
(45.59% true matches)
Identified 451 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 432 (95.79%)
2 : 16 (3.55%)
3 : 2 (0.44%)
17 : 1 (0.22%)
Identified 1 non-pure unique weight vectors (from 451 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 262
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 486
Number of unique weight vectors: 451
Time to load and analyse the weight vector file: 0.04 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(451, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 451 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 451 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 36 matches and 43 non-matches
Purity of oracle classification: 0.544
Entropy of oracle classification: 0.994
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 372 weight vectors
Based on 36 matches and 43 non-matches
Classified 148 matches and 224 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.5443037974683544, 0.9943290455933882, 0.45569620253164556)
(224, 0.5443037974683544, 0.9943290455933882, 0.45569620253164556)
Current size of match and non-match training data sets: 36 / 43
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 0.99
- Size 224 weight vectors
- Estimated match proportion 0.456
Sample size for this cluster: 67
Farthest first selection of 67 weight vectors from 224 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 4 matches and 63 non-matches
Purity of oracle classification: 0.940
Entropy of oracle classification: 0.326
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)100_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 100), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)100_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 886
Number of entity ID pairs that occurred more than once: 0
Analyse set of 886 weight vectors
Containing 175 true matches and 711 true non-matches
(19.75% true matches)
Identified 847 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 817 (96.46%)
2 : 27 (3.19%)
3 : 2 (0.24%)
9 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 847 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 156
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 690
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 877
Number of unique weight vectors: 846
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(846, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 846 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 846 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 760 weight vectors
Based on 24 matches and 62 non-matches
Classified 78 matches and 682 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(78, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(682, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 78 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 39
Farthest first selection of 39 weight vectors from 78 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.420, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
Perform oracle with 100.00 accuracy on 39 weight vectors
The oracle will correctly classify 39 weight vectors and wrongly classify 0
Classified 38 matches and 1 non-matches
Purity of oracle classification: 0.974
Entropy of oracle classification: 0.172
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 39 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(10)79_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 79), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)79_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 313
Number of entity ID pairs that occurred more than once: 0
Analyse set of 313 weight vectors
Containing 196 true matches and 117 true non-matches
(62.62% true matches)
Identified 289 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 276 (95.50%)
2 : 10 (3.46%)
3 : 2 (0.69%)
11 : 1 (0.35%)
Identified 1 non-pure unique weight vectors (from 289 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 116
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 312
Number of unique weight vectors: 289
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(289, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 289 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 72
Perform initial selection using "far" method
Farthest first selection of 72 weight vectors from 289 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 36 matches and 36 non-matches
Purity of oracle classification: 0.500
Entropy of oracle classification: 1.000
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 36
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 217 weight vectors
Based on 36 matches and 36 non-matches
Classified 143 matches and 74 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 72
Size, purity, entropy, and estimated match proportion of clusters in queue:
(143, 0.5, 1.0, 0.5)
(74, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 36 / 36
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 74 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 42
Farthest first selection of 42 weight vectors from 74 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.800, 1.000, 0.167, 0.180, 0.151, 0.147, 0.203] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 42 weight vectors
The oracle will correctly classify 42 weight vectors and wrongly classify 0
Classified 4 matches and 38 non-matches
Purity of oracle classification: 0.905
Entropy of oracle classification: 0.454
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 42 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)647_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 647), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)647_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)585_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 585), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)585_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 537
Number of entity ID pairs that occurred more than once: 0
Analyse set of 537 weight vectors
Containing 209 true matches and 328 true non-matches
(38.92% true matches)
Identified 506 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 491 (97.04%)
2 : 12 (2.37%)
3 : 2 (0.40%)
16 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 506 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 327
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 536
Number of unique weight vectors: 506
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(506, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 506 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 506 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 31 matches and 50 non-matches
Purity of oracle classification: 0.617
Entropy of oracle classification: 0.960
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 425 weight vectors
Based on 31 matches and 50 non-matches
Classified 150 matches and 275 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
(275, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
Current size of match and non-match training data sets: 31 / 50
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 275 weight vectors
- Estimated match proportion 0.383
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 275 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[1.000, 0.000, 0.864, 0.667, 0.435, 0.700, 0.600] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.846, 0.857, 0.353, 0.318, 0.400] (False)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.762, 0.714, 0.500, 0.400] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 0.296, 0.600, 0.471, 0.600, 0.643] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 4 matches and 64 non-matches
Purity of oracle classification: 0.941
Entropy of oracle classification: 0.323
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)947_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 947), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)947_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 28 matches and 57 non-matches
Classified 141 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 543 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 543 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 12 matches and 61 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.645
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)342_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 342), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)342_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 408
Number of entity ID pairs that occurred more than once: 0
Analyse set of 408 weight vectors
Containing 176 true matches and 232 true non-matches
(43.14% true matches)
Identified 385 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 368 (95.58%)
2 : 14 (3.64%)
3 : 2 (0.52%)
6 : 1 (0.26%)
Identified 0 non-pure unique weight vectors (from 385 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 155
0.000 : 230
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 408
Number of unique weight vectors: 385
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(385, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 385 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 385 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 38 matches and 39 non-matches
Purity of oracle classification: 0.506
Entropy of oracle classification: 1.000
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 308 weight vectors
Based on 38 matches and 39 non-matches
Classified 251 matches and 57 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(251, 0.5064935064935064, 0.9998783322990061, 0.4935064935064935)
(57, 0.5064935064935064, 0.9998783322990061, 0.4935064935064935)
Current size of match and non-match training data sets: 38 / 39
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 57 weight vectors
- Estimated match proportion 0.494
Sample size for this cluster: 36
Farthest first selection of 36 weight vectors from 57 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.818, 0.727, 0.438, 0.375, 0.400] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
Perform oracle with 100.00 accuracy on 36 weight vectors
The oracle will correctly classify 36 weight vectors and wrongly classify 0
Classified 0 matches and 36 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 36
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 36 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(20)840_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 840), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)840_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)459_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 459), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)459_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 813
Number of entity ID pairs that occurred more than once: 0
Analyse set of 813 weight vectors
Containing 209 true matches and 604 true non-matches
(25.71% true matches)
Identified 766 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 731 (95.43%)
2 : 32 (4.18%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 766 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 583
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 812
Number of unique weight vectors: 766
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(766, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 766 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 766 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 681 weight vectors
Based on 26 matches and 59 non-matches
Classified 126 matches and 555 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(126, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(555, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 126 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 126 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 48 matches and 2 non-matches
Purity of oracle classification: 0.960
Entropy of oracle classification: 0.242
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)133_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.980198
recall 0.331104
f-measure 0.495
da 101
dm 0
ndm 0
tp 99
fp 2
tn 4.76529e+07
fn 200
Name: (10, 1 - acm diverg, 133), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)133_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 159
Number of entity ID pairs that occurred more than once: 0
Analyse set of 159 weight vectors
Containing 136 true matches and 23 true non-matches
(85.53% true matches)
Identified 148 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 141 (95.27%)
2 : 4 (2.70%)
3 : 2 (1.35%)
4 : 1 (0.68%)
Identified 0 non-pure unique weight vectors (from 148 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 125
0.000 : 23
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 159
Number of unique weight vectors: 148
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 148 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 58
Perform initial selection using "far" method
Farthest first selection of 58 weight vectors from 148 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 39 matches and 19 non-matches
Purity of oracle classification: 0.672
Entropy of oracle classification: 0.912
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 19
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 90 weight vectors
Based on 39 matches and 19 non-matches
Classified 90 matches and 0 non-matches
101.0
Analisando o arquivo: diverg(15)870_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 870), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)870_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 678
Number of entity ID pairs that occurred more than once: 0
Analyse set of 678 weight vectors
Containing 215 true matches and 463 true non-matches
(31.71% true matches)
Identified 626 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 590 (94.25%)
2 : 33 (5.27%)
3 : 2 (0.32%)
16 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 626 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 442
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 677
Number of unique weight vectors: 626
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(626, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 626 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 626 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 543 weight vectors
Based on 28 matches and 55 non-matches
Classified 134 matches and 409 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(134, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(409, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 409 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 409 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 14 matches and 57 non-matches
Purity of oracle classification: 0.803
Entropy of oracle classification: 0.716
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)442_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 442), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)442_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1021
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1021 weight vectors
Containing 207 true matches and 814 true non-matches
(20.27% true matches)
Identified 965 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 929 (96.27%)
2 : 33 (3.42%)
3 : 2 (0.21%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 965 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 171
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 793
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1020
Number of unique weight vectors: 965
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(965, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 965 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 965 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 878 weight vectors
Based on 28 matches and 59 non-matches
Classified 311 matches and 567 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(311, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(567, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 567 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 567 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 0 matches and 73 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)836_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 836), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)836_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 663
Number of entity ID pairs that occurred more than once: 0
Analyse set of 663 weight vectors
Containing 212 true matches and 451 true non-matches
(31.98% true matches)
Identified 608 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 572 (94.08%)
2 : 33 (5.43%)
3 : 2 (0.33%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 608 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 430
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 662
Number of unique weight vectors: 608
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(608, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 608 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 608 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 26 matches and 57 non-matches
Purity of oracle classification: 0.687
Entropy of oracle classification: 0.897
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 525 weight vectors
Based on 26 matches and 57 non-matches
Classified 200 matches and 325 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(200, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
(325, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
Current size of match and non-match training data sets: 26 / 57
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 200 weight vectors
- Estimated match proportion 0.313
Sample size for this cluster: 59
Farthest first selection of 59 weight vectors from 200 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.530, 1.000, 0.159, 0.086, 0.182, 0.159, 0.163] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 59 weight vectors
The oracle will correctly classify 59 weight vectors and wrongly classify 0
Classified 41 matches and 18 non-matches
Purity of oracle classification: 0.695
Entropy of oracle classification: 0.887
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 18
Number of false non-matches: 0
Deleted 59 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)298_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 298), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)298_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 722
Number of entity ID pairs that occurred more than once: 0
Analyse set of 722 weight vectors
Containing 219 true matches and 503 true non-matches
(30.33% true matches)
Identified 686 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 667 (97.23%)
2 : 16 (2.33%)
3 : 2 (0.29%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 686 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 500
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 721
Number of unique weight vectors: 686
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(686, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 686 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 686 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 35 matches and 49 non-matches
Purity of oracle classification: 0.583
Entropy of oracle classification: 0.980
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 602 weight vectors
Based on 35 matches and 49 non-matches
Classified 289 matches and 313 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(289, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
(313, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
Current size of match and non-match training data sets: 35 / 49
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 313 weight vectors
- Estimated match proportion 0.417
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 313 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.700, 0.645, 0.316, 0.455, 0.714] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.667, 0.857, 0.353, 0.632, 0.550] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.296, 0.600, 0.471, 0.600, 0.643] (False)
[0.667, 0.000, 0.800, 0.684, 0.667, 0.529, 0.609] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.950, 0.000, 0.619, 0.800, 0.478, 0.280, 0.625] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.611, 0.000, 0.800, 0.684, 0.500, 0.778, 0.609] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.533, 0.000, 0.577, 0.783, 0.429, 0.615, 0.478] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.600, 0.700, 0.600, 0.611, 0.706] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 0 matches and 72 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)549_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 549), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)549_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 742
Number of entity ID pairs that occurred more than once: 0
Analyse set of 742 weight vectors
Containing 163 true matches and 579 true non-matches
(21.97% true matches)
Identified 721 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 710 (98.47%)
2 : 8 (1.11%)
3 : 2 (0.28%)
10 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 721 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 144
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 576
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 741
Number of unique weight vectors: 721
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(721, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 721 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 721 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 637 weight vectors
Based on 29 matches and 55 non-matches
Classified 106 matches and 531 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(106, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(531, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 106 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 106 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 43 matches and 5 non-matches
Purity of oracle classification: 0.896
Entropy of oracle classification: 0.482
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)331_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (10, 1 - acm diverg, 331), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)331_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 961
Number of entity ID pairs that occurred more than once: 0
Analyse set of 961 weight vectors
Containing 165 true matches and 796 true non-matches
(17.17% true matches)
Identified 924 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 893 (96.65%)
2 : 28 (3.03%)
3 : 2 (0.22%)
6 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 924 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.000 : 776
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 961
Number of unique weight vectors: 924
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(924, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 924 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 924 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 837 weight vectors
Based on 31 matches and 56 non-matches
Classified 262 matches and 575 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(262, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(575, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 262 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 66
Farthest first selection of 66 weight vectors from 262 vectors
The selected farthest weight vectors are:
[0.512, 1.000, 0.087, 0.190, 0.107, 0.226, 0.204] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 66 weight vectors
The oracle will correctly classify 66 weight vectors and wrongly classify 0
Classified 42 matches and 24 non-matches
Purity of oracle classification: 0.636
Entropy of oracle classification: 0.946
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 24
Number of false non-matches: 0
Deleted 66 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(20)879_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (20, 1 - acm diverg, 879), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)879_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 908
Number of entity ID pairs that occurred more than once: 0
Analyse set of 908 weight vectors
Containing 204 true matches and 704 true non-matches
(22.47% true matches)
Identified 859 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 825 (96.04%)
2 : 31 (3.61%)
3 : 2 (0.23%)
15 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 859 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 683
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 907
Number of unique weight vectors: 859
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(859, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 859 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 859 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 27 matches and 59 non-matches
Purity of oracle classification: 0.686
Entropy of oracle classification: 0.898
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 773 weight vectors
Based on 27 matches and 59 non-matches
Classified 76 matches and 697 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(76, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
(697, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
Current size of match and non-match training data sets: 27 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 697 weight vectors
- Estimated match proportion 0.314
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 697 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 19 matches and 55 non-matches
Purity of oracle classification: 0.743
Entropy of oracle classification: 0.822
Number of true matches: 19
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)751_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 751), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)751_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1077
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1077 weight vectors
Containing 221 true matches and 856 true non-matches
(20.52% true matches)
Identified 1021 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 985 (96.47%)
2 : 33 (3.23%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1021 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 835
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1076
Number of unique weight vectors: 1021
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1021, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1021 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1021 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 934 weight vectors
Based on 30 matches and 57 non-matches
Classified 170 matches and 764 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(170, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(764, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 170 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 170 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 43 matches and 15 non-matches
Purity of oracle classification: 0.741
Entropy of oracle classification: 0.825
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 15
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)551_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 551), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)551_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 28 matches and 60 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 28 matches and 60 non-matches
Classified 156 matches and 800 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
(800, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
Current size of match and non-match training data sets: 28 / 60
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 800 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 800 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 4 matches and 71 non-matches
Purity of oracle classification: 0.947
Entropy of oracle classification: 0.300
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)834_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 834), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)834_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 781
Number of entity ID pairs that occurred more than once: 0
Analyse set of 781 weight vectors
Containing 206 true matches and 575 true non-matches
(26.38% true matches)
Identified 752 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 735 (97.74%)
2 : 14 (1.86%)
3 : 2 (0.27%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 752 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 572
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 780
Number of unique weight vectors: 752
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(752, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 752 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 752 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 667 weight vectors
Based on 29 matches and 56 non-matches
Classified 141 matches and 526 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(526, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 526 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 526 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.538, 0.613, 0.789, 0.227, 0.857] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 7 matches and 67 non-matches
Purity of oracle classification: 0.905
Entropy of oracle classification: 0.452
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)362_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.99
recall 0.331104
f-measure 0.496241
da 100
dm 0
ndm 0
tp 99
fp 1
tn 4.76529e+07
fn 200
Name: (15, 1 - acm diverg, 362), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)362_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 745
Number of entity ID pairs that occurred more than once: 0
Analyse set of 745 weight vectors
Containing 166 true matches and 579 true non-matches
(22.28% true matches)
Identified 724 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 713 (98.48%)
2 : 8 (1.10%)
3 : 2 (0.28%)
10 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 724 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 147
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 576
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 744
Number of unique weight vectors: 724
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(724, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 724 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 724 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 639 weight vectors
Based on 30 matches and 55 non-matches
Classified 109 matches and 530 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(530, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 109 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 109 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 45 matches and 4 non-matches
Purity of oracle classification: 0.918
Entropy of oracle classification: 0.408
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(10)678_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984615
recall 0.214047
f-measure 0.351648
da 65
dm 0
ndm 0
tp 64
fp 1
tn 4.76529e+07
fn 235
Name: (10, 1 - acm diverg, 678), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)678_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 222
Number of entity ID pairs that occurred more than once: 0
Analyse set of 222 weight vectors
Containing 183 true matches and 39 true non-matches
(82.43% true matches)
Identified 197 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 186 (94.42%)
2 : 8 (4.06%)
3 : 2 (1.02%)
14 : 1 (0.51%)
Identified 1 non-pure unique weight vectors (from 197 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 158
0.929 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 38
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 221
Number of unique weight vectors: 197
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(197, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 197 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 65
Perform initial selection using "far" method
Farthest first selection of 65 weight vectors from 197 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 39 matches and 26 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 132 weight vectors
Based on 39 matches and 26 non-matches
Classified 132 matches and 0 non-matches
65.0
Analisando o arquivo: diverg(20)509_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 509), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)509_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.05 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 28 matches and 57 non-matches
Classified 141 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 141 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 141 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)942_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 942), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)942_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 566 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)278_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 278), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)278_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 131 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)321_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 321), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)321_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 649
Number of entity ID pairs that occurred more than once: 0
Analyse set of 649 weight vectors
Containing 199 true matches and 450 true non-matches
(30.66% true matches)
Identified 622 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 606 (97.43%)
2 : 13 (2.09%)
3 : 2 (0.32%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 622 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 447
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 648
Number of unique weight vectors: 622
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(622, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 622 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 622 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 27 matches and 56 non-matches
Purity of oracle classification: 0.675
Entropy of oracle classification: 0.910
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 539 weight vectors
Based on 27 matches and 56 non-matches
Classified 127 matches and 412 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(127, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
(412, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
Current size of match and non-match training data sets: 27 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 412 weight vectors
- Estimated match proportion 0.325
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 412 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 12 matches and 58 non-matches
Purity of oracle classification: 0.829
Entropy of oracle classification: 0.661
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)584_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 584), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)584_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 555
Number of entity ID pairs that occurred more than once: 0
Analyse set of 555 weight vectors
Containing 173 true matches and 382 true non-matches
(31.17% true matches)
Identified 537 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 528 (98.32%)
2 : 6 (1.12%)
3 : 2 (0.37%)
9 : 1 (0.19%)
Identified 1 non-pure unique weight vectors (from 537 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 155
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 381
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 546
Number of unique weight vectors: 536
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(536, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 536 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 536 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 27 matches and 54 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 455 weight vectors
Based on 27 matches and 54 non-matches
Classified 114 matches and 341 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(341, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 27 / 54
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 114 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 114 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 47 matches and 2 non-matches
Purity of oracle classification: 0.959
Entropy of oracle classification: 0.246
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(20)1_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 1), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)1_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 118 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 118 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)321_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 321), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)321_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 456
Number of entity ID pairs that occurred more than once: 0
Analyse set of 456 weight vectors
Containing 215 true matches and 241 true non-matches
(47.15% true matches)
Identified 421 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 406 (96.44%)
2 : 12 (2.85%)
3 : 2 (0.48%)
20 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 421 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 240
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 455
Number of unique weight vectors: 421
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(421, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 421 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 421 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 34 matches and 44 non-matches
Purity of oracle classification: 0.564
Entropy of oracle classification: 0.988
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 343 weight vectors
Based on 34 matches and 44 non-matches
Classified 141 matches and 202 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.5641025641025641, 0.9881108365218301, 0.4358974358974359)
(202, 0.5641025641025641, 0.9881108365218301, 0.4358974358974359)
Current size of match and non-match training data sets: 34 / 44
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 202 weight vectors
- Estimated match proportion 0.436
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 202 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.929, 1.000, 0.182, 0.238, 0.188, 0.146, 0.270] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 7 matches and 57 non-matches
Purity of oracle classification: 0.891
Entropy of oracle classification: 0.498
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)140_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 140), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)140_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)511_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 511), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)511_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 123 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)289_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 289), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)289_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 541
Number of entity ID pairs that occurred more than once: 0
Analyse set of 541 weight vectors
Containing 220 true matches and 321 true non-matches
(40.67% true matches)
Identified 503 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 485 (96.42%)
2 : 15 (2.98%)
3 : 2 (0.40%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 503 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 318
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 540
Number of unique weight vectors: 503
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(503, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 503 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 503 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 32 matches and 48 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 423 weight vectors
Based on 32 matches and 48 non-matches
Classified 142 matches and 281 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6, 0.9709505944546686, 0.4)
(281, 0.6, 0.9709505944546686, 0.4)
Current size of match and non-match training data sets: 32 / 48
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 142 weight vectors
- Estimated match proportion 0.400
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 53 matches and 3 non-matches
Purity of oracle classification: 0.946
Entropy of oracle classification: 0.301
Number of true matches: 53
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)2_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (15, 1 - acm diverg, 2), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)2_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 597
Number of entity ID pairs that occurred more than once: 0
Analyse set of 597 weight vectors
Containing 201 true matches and 396 true non-matches
(33.67% true matches)
Identified 566 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 550 (97.17%)
2 : 13 (2.30%)
3 : 2 (0.35%)
15 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 566 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 393
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 596
Number of unique weight vectors: 566
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(566, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 566 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 31 matches and 51 non-matches
Purity of oracle classification: 0.622
Entropy of oracle classification: 0.957
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 484 weight vectors
Based on 31 matches and 51 non-matches
Classified 144 matches and 340 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
(340, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
Current size of match and non-match training data sets: 31 / 51
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 340 weight vectors
- Estimated match proportion 0.378
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 340 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.500, 0.826, 0.429, 0.538, 0.636] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 4 matches and 67 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.313
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)454_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 454), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)454_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 943
Number of entity ID pairs that occurred more than once: 0
Analyse set of 943 weight vectors
Containing 199 true matches and 744 true non-matches
(21.10% true matches)
Identified 898 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 864 (96.21%)
2 : 31 (3.45%)
3 : 2 (0.22%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 898 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 723
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 942
Number of unique weight vectors: 898
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(898, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 898 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 898 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 25 matches and 61 non-matches
Purity of oracle classification: 0.709
Entropy of oracle classification: 0.870
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 812 weight vectors
Based on 25 matches and 61 non-matches
Classified 123 matches and 689 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
(689, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
Current size of match and non-match training data sets: 25 / 61
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 689 weight vectors
- Estimated match proportion 0.291
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 689 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 13 matches and 58 non-matches
Purity of oracle classification: 0.817
Entropy of oracle classification: 0.687
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)166_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 166), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)166_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 829
Number of entity ID pairs that occurred more than once: 0
Analyse set of 829 weight vectors
Containing 227 true matches and 602 true non-matches
(27.38% true matches)
Identified 772 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 735 (95.21%)
2 : 34 (4.40%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 772 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 581
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 828
Number of unique weight vectors: 772
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(772, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 772 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 772 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 687 weight vectors
Based on 28 matches and 57 non-matches
Classified 150 matches and 537 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(537, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 537 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 537 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 9 matches and 64 non-matches
Purity of oracle classification: 0.877
Entropy of oracle classification: 0.539
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)443_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976
recall 0.408027
f-measure 0.575472
da 125
dm 0
ndm 0
tp 122
fp 3
tn 4.76529e+07
fn 177
Name: (10, 1 - acm diverg, 443), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)443_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 661
Number of entity ID pairs that occurred more than once: 0
Analyse set of 661 weight vectors
Containing 140 true matches and 521 true non-matches
(21.18% true matches)
Identified 645 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 634 (98.29%)
2 : 8 (1.24%)
3 : 2 (0.31%)
5 : 1 (0.16%)
Identified 0 non-pure unique weight vectors (from 645 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 126
0.000 : 519
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 661
Number of unique weight vectors: 645
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(645, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 645 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 645 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 29 matches and 54 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.934
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 562 weight vectors
Based on 29 matches and 54 non-matches
Classified 90 matches and 472 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(90, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
(472, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
Current size of match and non-match training data sets: 29 / 54
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 90 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 90 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 40 matches and 5 non-matches
Purity of oracle classification: 0.889
Entropy of oracle classification: 0.503
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
125.0
Analisando o arquivo: diverg(20)443_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 443), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)443_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1027
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1027 weight vectors
Containing 223 true matches and 804 true non-matches
(21.71% true matches)
Identified 973 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 936 (96.20%)
2 : 34 (3.49%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 973 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 783
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 973
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(973, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 973 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 973 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 886 weight vectors
Based on 26 matches and 61 non-matches
Classified 131 matches and 755 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(755, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 755 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 755 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 11 matches and 62 non-matches
Purity of oracle classification: 0.849
Entropy of oracle classification: 0.612
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)285_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 285), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)285_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 460
Number of entity ID pairs that occurred more than once: 0
Analyse set of 460 weight vectors
Containing 210 true matches and 250 true non-matches
(45.65% true matches)
Identified 426 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 409 (96.01%)
2 : 14 (3.29%)
3 : 2 (0.47%)
17 : 1 (0.23%)
Identified 1 non-pure unique weight vectors (from 426 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 247
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 459
Number of unique weight vectors: 426
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(426, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 426 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 426 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 37 matches and 41 non-matches
Purity of oracle classification: 0.526
Entropy of oracle classification: 0.998
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 348 weight vectors
Based on 37 matches and 41 non-matches
Classified 246 matches and 102 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(246, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
(102, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
Current size of match and non-match training data sets: 37 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 102 weight vectors
- Estimated match proportion 0.474
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 102 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.635, 1.000, 0.179, 0.265, 0.167, 0.121, 0.241] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.800, 1.000, 0.111, 0.200, 0.100, 0.194, 0.094] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 0 matches and 50 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)853_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 853), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)853_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 432
Number of entity ID pairs that occurred more than once: 0
Analyse set of 432 weight vectors
Containing 194 true matches and 238 true non-matches
(44.91% true matches)
Identified 408 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 391 (95.83%)
2 : 14 (3.43%)
3 : 2 (0.49%)
7 : 1 (0.25%)
Identified 0 non-pure unique weight vectors (from 408 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.000 : 236
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 432
Number of unique weight vectors: 408
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(408, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 408 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 408 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 38 matches and 40 non-matches
Purity of oracle classification: 0.513
Entropy of oracle classification: 1.000
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 40
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 330 weight vectors
Based on 38 matches and 40 non-matches
Classified 269 matches and 61 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(269, 0.5128205128205128, 0.9995256892936493, 0.48717948717948717)
(61, 0.5128205128205128, 0.9995256892936493, 0.48717948717948717)
Current size of match and non-match training data sets: 38 / 40
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 269 weight vectors
- Estimated match proportion 0.487
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 269 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.913, 1.000, 0.184, 0.175, 0.087, 0.233, 0.167] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 43 matches and 28 non-matches
Purity of oracle classification: 0.606
Entropy of oracle classification: 0.968
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)111_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 111), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)111_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 0 matches and 829 non-matches
40.0
Analisando o arquivo: diverg(20)745_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 745), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)745_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 855
Number of entity ID pairs that occurred more than once: 0
Analyse set of 855 weight vectors
Containing 221 true matches and 634 true non-matches
(25.85% true matches)
Identified 799 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 763 (95.49%)
2 : 33 (4.13%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 799 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 854
Number of unique weight vectors: 799
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(799, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 799 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 799 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 714 weight vectors
Based on 28 matches and 57 non-matches
Classified 150 matches and 564 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(564, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 150 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 150 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)768_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 768), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)768_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 668
Number of entity ID pairs that occurred more than once: 0
Analyse set of 668 weight vectors
Containing 207 true matches and 461 true non-matches
(30.99% true matches)
Identified 637 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 623 (97.80%)
2 : 11 (1.73%)
3 : 2 (0.31%)
17 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 637 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 460
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 667
Number of unique weight vectors: 637
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(637, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 637 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 637 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 29 matches and 54 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.934
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 554 weight vectors
Based on 29 matches and 54 non-matches
Classified 140 matches and 414 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
(414, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
Current size of match and non-match training data sets: 29 / 54
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 140 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 140 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 50 matches and 4 non-matches
Purity of oracle classification: 0.926
Entropy of oracle classification: 0.381
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)912_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (15, 1 - acm diverg, 912), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)912_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 323
Number of entity ID pairs that occurred more than once: 0
Analyse set of 323 weight vectors
Containing 207 true matches and 116 true non-matches
(64.09% true matches)
Identified 291 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 277 (95.19%)
2 : 11 (3.78%)
3 : 2 (0.69%)
18 : 1 (0.34%)
Identified 1 non-pure unique weight vectors (from 291 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 115
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 322
Number of unique weight vectors: 291
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(291, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 291 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 72
Perform initial selection using "far" method
Farthest first selection of 72 weight vectors from 291 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 33 matches and 39 non-matches
Purity of oracle classification: 0.542
Entropy of oracle classification: 0.995
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 219 weight vectors
Based on 33 matches and 39 non-matches
Classified 146 matches and 73 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 72
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.5416666666666666, 0.9949848281859701, 0.4583333333333333)
(73, 0.5416666666666666, 0.9949848281859701, 0.4583333333333333)
Current size of match and non-match training data sets: 33 / 39
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 0.99
- Size 146 weight vectors
- Estimated match proportion 0.458
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 146 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 50 matches and 8 non-matches
Purity of oracle classification: 0.862
Entropy of oracle classification: 0.579
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(15)265_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 265), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)265_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 902
Number of entity ID pairs that occurred more than once: 0
Analyse set of 902 weight vectors
Containing 214 true matches and 688 true non-matches
(23.73% true matches)
Identified 850 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 814 (95.76%)
2 : 33 (3.88%)
3 : 2 (0.24%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 850 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 667
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 901
Number of unique weight vectors: 850
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(850, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 850 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 850 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 764 weight vectors
Based on 29 matches and 57 non-matches
Classified 181 matches and 583 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(181, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(583, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 583 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 583 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.538, 0.789, 0.353, 0.545, 0.550] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.857, 0.417, 0.750, 0.500, 0.455] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 0 matches and 75 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 75
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)96_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 96), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)96_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1027
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1027 weight vectors
Containing 223 true matches and 804 true non-matches
(21.71% true matches)
Identified 973 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 936 (96.20%)
2 : 34 (3.49%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 973 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 783
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 973
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(973, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 973 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 973 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 886 weight vectors
Based on 26 matches and 61 non-matches
Classified 131 matches and 755 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(755, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 131 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 49 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.141
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)525_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978723
recall 0.461538
f-measure 0.627273
da 141
dm 0
ndm 0
tp 138
fp 3
tn 4.76529e+07
fn 161
Name: (10, 1 - acm diverg, 525), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)525_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 802
Number of entity ID pairs that occurred more than once: 0
Analyse set of 802 weight vectors
Containing 118 true matches and 684 true non-matches
(14.71% true matches)
Identified 772 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 745 (96.50%)
2 : 24 (3.11%)
3 : 3 (0.39%)
Identified 0 non-pure unique weight vectors (from 772 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 108
0.000 : 664
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 802
Number of unique weight vectors: 772
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(772, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 772 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 772 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 24 matches and 61 non-matches
Purity of oracle classification: 0.718
Entropy of oracle classification: 0.859
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 687 weight vectors
Based on 24 matches and 61 non-matches
Classified 82 matches and 605 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(82, 0.7176470588235294, 0.8586370819183629, 0.2823529411764706)
(605, 0.7176470588235294, 0.8586370819183629, 0.2823529411764706)
Current size of match and non-match training data sets: 24 / 61
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 605 weight vectors
- Estimated match proportion 0.282
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 605 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 2 matches and 67 non-matches
Purity of oracle classification: 0.971
Entropy of oracle classification: 0.189
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
141.0
Analisando o arquivo: diverg(20)550_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 550), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)550_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1068
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1068 weight vectors
Containing 226 true matches and 842 true non-matches
(21.16% true matches)
Identified 1011 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 974 (96.34%)
2 : 34 (3.36%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1011 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 821
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1067
Number of unique weight vectors: 1011
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1011, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1011 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1011 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 924 weight vectors
Based on 24 matches and 63 non-matches
Classified 131 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(793, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 131 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)1_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 1), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)1_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 380
Number of entity ID pairs that occurred more than once: 0
Analyse set of 380 weight vectors
Containing 216 true matches and 164 true non-matches
(56.84% true matches)
Identified 347 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 331 (95.39%)
2 : 13 (3.75%)
3 : 2 (0.58%)
17 : 1 (0.29%)
Identified 1 non-pure unique weight vectors (from 347 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 163
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 379
Number of unique weight vectors: 347
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(347, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 347 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 347 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 46 matches and 29 non-matches
Purity of oracle classification: 0.613
Entropy of oracle classification: 0.963
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 29
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 272 weight vectors
Based on 46 matches and 29 non-matches
Classified 272 matches and 0 non-matches
42.0
Analisando o arquivo: diverg(15)422_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.99
recall 0.331104
f-measure 0.496241
da 100
dm 0
ndm 0
tp 99
fp 1
tn 4.76529e+07
fn 200
Name: (15, 1 - acm diverg, 422), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)422_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1020
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1020 weight vectors
Containing 167 true matches and 853 true non-matches
(16.37% true matches)
Identified 981 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 952 (97.04%)
2 : 26 (2.65%)
3 : 2 (0.20%)
10 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 981 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 832
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1019
Number of unique weight vectors: 981
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(981, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 981 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 981 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 894 weight vectors
Based on 30 matches and 57 non-matches
Classified 196 matches and 698 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(196, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(698, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 698 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 698 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(20)744_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 744), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)744_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1100
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1100 weight vectors
Containing 227 true matches and 873 true non-matches
(20.64% true matches)
Identified 1043 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1006 (96.45%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1043 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1099
Number of unique weight vectors: 1043
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1043, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1043 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1043 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 955 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 846 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(846, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)39_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 39), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)39_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 959
Number of entity ID pairs that occurred more than once: 0
Analyse set of 959 weight vectors
Containing 217 true matches and 742 true non-matches
(22.63% true matches)
Identified 904 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 868 (96.02%)
2 : 33 (3.65%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 904 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 721
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 958
Number of unique weight vectors: 904
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(904, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 904 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 904 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 817 weight vectors
Based on 27 matches and 60 non-matches
Classified 150 matches and 667 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(667, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 150 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 150 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)951_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 951), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)951_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)74_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 74), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)74_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1100
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1100 weight vectors
Containing 227 true matches and 873 true non-matches
(20.64% true matches)
Identified 1043 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1006 (96.45%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1043 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1099
Number of unique weight vectors: 1043
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1043, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1043 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1043 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 955 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 846 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(846, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)83_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 83), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)83_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(20)170_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 170), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)170_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 28 matches and 60 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 28 matches and 60 non-matches
Classified 156 matches and 800 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
(800, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
Current size of match and non-match training data sets: 28 / 60
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 800 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 800 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 4 matches and 71 non-matches
Purity of oracle classification: 0.947
Entropy of oracle classification: 0.300
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)413_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (10, 1 - acm diverg, 413), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)413_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 289
Number of entity ID pairs that occurred more than once: 0
Analyse set of 289 weight vectors
Containing 157 true matches and 132 true non-matches
(54.33% true matches)
Identified 273 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 261 (95.60%)
2 : 9 (3.30%)
3 : 2 (0.73%)
4 : 1 (0.37%)
Identified 0 non-pure unique weight vectors (from 273 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 143
0.000 : 130
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 289
Number of unique weight vectors: 273
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(273, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 273 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 71
Perform initial selection using "far" method
Farthest first selection of 71 weight vectors from 273 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 30 matches and 41 non-matches
Purity of oracle classification: 0.577
Entropy of oracle classification: 0.983
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 202 weight vectors
Based on 30 matches and 41 non-matches
Classified 115 matches and 87 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 71
Size, purity, entropy, and estimated match proportion of clusters in queue:
(115, 0.5774647887323944, 0.982615428552612, 0.4225352112676056)
(87, 0.5774647887323944, 0.982615428552612, 0.4225352112676056)
Current size of match and non-match training data sets: 30 / 41
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 87 weight vectors
- Estimated match proportion 0.423
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 87 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 2 matches and 43 non-matches
Purity of oracle classification: 0.956
Entropy of oracle classification: 0.262
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(15)652_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 652), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)652_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 809
Number of entity ID pairs that occurred more than once: 0
Analyse set of 809 weight vectors
Containing 223 true matches and 586 true non-matches
(27.56% true matches)
Identified 755 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 718 (95.10%)
2 : 34 (4.50%)
3 : 2 (0.26%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 755 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 565
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 808
Number of unique weight vectors: 755
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(755, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 755 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 755 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 670 weight vectors
Based on 27 matches and 58 non-matches
Classified 94 matches and 576 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(576, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 576 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 576 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 20 matches and 53 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 20
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)418_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 418), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)418_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)68_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 68), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)68_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 223 true matches and 585 true non-matches
(27.60% true matches)
Identified 754 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 717 (95.09%)
2 : 34 (4.51%)
3 : 2 (0.27%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 754 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 564
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 754
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(754, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 754 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 754 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 669 weight vectors
Based on 27 matches and 58 non-matches
Classified 93 matches and 576 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(93, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(576, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 576 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 576 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 20 matches and 53 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 20
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)63_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 63), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)63_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 103 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 103 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 43 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)763_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 763), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)763_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 718
Number of entity ID pairs that occurred more than once: 0
Analyse set of 718 weight vectors
Containing 203 true matches and 515 true non-matches
(28.27% true matches)
Identified 692 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 678 (97.98%)
2 : 11 (1.59%)
3 : 2 (0.29%)
12 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 692 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 514
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 717
Number of unique weight vectors: 692
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(692, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 692 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 692 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 608 weight vectors
Based on 27 matches and 57 non-matches
Classified 114 matches and 494 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(494, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 114 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 114 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 47 matches and 2 non-matches
Purity of oracle classification: 0.959
Entropy of oracle classification: 0.246
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)762_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 762), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)762_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)673_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 673), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)673_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 754
Number of entity ID pairs that occurred more than once: 0
Analyse set of 754 weight vectors
Containing 222 true matches and 532 true non-matches
(29.44% true matches)
Identified 718 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 699 (97.35%)
2 : 16 (2.23%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 718 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 529
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 753
Number of unique weight vectors: 718
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(718, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 718 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 718 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 634 weight vectors
Based on 28 matches and 56 non-matches
Classified 135 matches and 499 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(499, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 499 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 499 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 13 matches and 60 non-matches
Purity of oracle classification: 0.822
Entropy of oracle classification: 0.676
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)783_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 783), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)783_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 708
Number of entity ID pairs that occurred more than once: 0
Analyse set of 708 weight vectors
Containing 196 true matches and 512 true non-matches
(27.68% true matches)
Identified 684 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 667 (97.51%)
2 : 14 (2.05%)
3 : 2 (0.29%)
7 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 684 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 510
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 708
Number of unique weight vectors: 684
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(684, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 684 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 684 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 32 matches and 52 non-matches
Purity of oracle classification: 0.619
Entropy of oracle classification: 0.959
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 600 weight vectors
Based on 32 matches and 52 non-matches
Classified 285 matches and 315 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(285, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
(315, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
Current size of match and non-match training data sets: 32 / 52
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 315 weight vectors
- Estimated match proportion 0.381
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 315 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.667, 0.000, 0.800, 0.684, 0.667, 0.529, 0.609] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 0 matches and 70 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)29_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 29), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)29_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 793 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 793 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.736, 1.000, 0.250, 0.290, 0.172, 0.188, 0.286] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 14 matches and 59 non-matches
Purity of oracle classification: 0.808
Entropy of oracle classification: 0.705
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)852_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976
recall 0.408027
f-measure 0.575472
da 125
dm 0
ndm 0
tp 122
fp 3
tn 4.76529e+07
fn 177
Name: (10, 1 - acm diverg, 852), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)852_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 696
Number of entity ID pairs that occurred more than once: 0
Analyse set of 696 weight vectors
Containing 143 true matches and 553 true non-matches
(20.55% true matches)
Identified 662 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 633 (95.62%)
2 : 26 (3.93%)
3 : 2 (0.30%)
5 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 662 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 129
0.000 : 533
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 696
Number of unique weight vectors: 662
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(662, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 662 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 662 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 26 matches and 58 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.893
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 578 weight vectors
Based on 26 matches and 58 non-matches
Classified 87 matches and 491 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(87, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
(491, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
Current size of match and non-match training data sets: 26 / 58
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 87 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 87 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 40 matches and 3 non-matches
Purity of oracle classification: 0.930
Entropy of oracle classification: 0.365
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
125.0
Analisando o arquivo: diverg(15)92_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979592
recall 0.32107
f-measure 0.483627
da 98
dm 0
ndm 0
tp 96
fp 2
tn 4.76529e+07
fn 203
Name: (15, 1 - acm diverg, 92), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)92_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 978
Number of entity ID pairs that occurred more than once: 0
Analyse set of 978 weight vectors
Containing 169 true matches and 809 true non-matches
(17.28% true matches)
Identified 941 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 910 (96.71%)
2 : 28 (2.98%)
3 : 2 (0.21%)
6 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 941 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 152
0.000 : 789
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 978
Number of unique weight vectors: 941
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(941, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 941 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 941 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 854 weight vectors
Based on 27 matches and 60 non-matches
Classified 204 matches and 650 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(204, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(650, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 650 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 650 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 0 matches and 73 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
98.0
Analisando o arquivo: diverg(10)920_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 920), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)920_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 736
Number of entity ID pairs that occurred more than once: 0
Analyse set of 736 weight vectors
Containing 196 true matches and 540 true non-matches
(26.63% true matches)
Identified 694 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 659 (94.96%)
2 : 32 (4.61%)
3 : 2 (0.29%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 694 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 520
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 736
Number of unique weight vectors: 694
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(694, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 694 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 694 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 25 matches and 59 non-matches
Purity of oracle classification: 0.702
Entropy of oracle classification: 0.878
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 610 weight vectors
Based on 25 matches and 59 non-matches
Classified 125 matches and 485 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(125, 0.7023809523809523, 0.8783609387702276, 0.2976190476190476)
(485, 0.7023809523809523, 0.8783609387702276, 0.2976190476190476)
Current size of match and non-match training data sets: 25 / 59
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 485 weight vectors
- Estimated match proportion 0.298
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 485 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 12 matches and 57 non-matches
Purity of oracle classification: 0.826
Entropy of oracle classification: 0.667
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)735_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 735), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)735_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 123 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)398_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 398), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)398_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1092
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1092 weight vectors
Containing 226 true matches and 866 true non-matches
(20.70% true matches)
Identified 1035 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 998 (96.43%)
2 : 34 (3.29%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1035 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1091
Number of unique weight vectors: 1035
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1035, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1035 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1035 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 27 matches and 61 non-matches
Purity of oracle classification: 0.693
Entropy of oracle classification: 0.889
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 947 weight vectors
Based on 27 matches and 61 non-matches
Classified 148 matches and 799 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6931818181818182, 0.8894663896628687, 0.3068181818181818)
(799, 0.6931818181818182, 0.8894663896628687, 0.3068181818181818)
Current size of match and non-match training data sets: 27 / 61
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 799 weight vectors
- Estimated match proportion 0.307
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 799 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 9 matches and 65 non-matches
Purity of oracle classification: 0.878
Entropy of oracle classification: 0.534
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)889_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 889), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)889_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 754
Number of entity ID pairs that occurred more than once: 0
Analyse set of 754 weight vectors
Containing 222 true matches and 532 true non-matches
(29.44% true matches)
Identified 718 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 699 (97.35%)
2 : 16 (2.23%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 718 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 529
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 753
Number of unique weight vectors: 718
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(718, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 718 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 718 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 634 weight vectors
Based on 28 matches and 56 non-matches
Classified 135 matches and 499 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(499, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 499 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 499 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 13 matches and 60 non-matches
Purity of oracle classification: 0.822
Entropy of oracle classification: 0.676
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)34_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 34), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)34_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 817 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 817 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 11 matches and 60 non-matches
Purity of oracle classification: 0.845
Entropy of oracle classification: 0.622
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)888_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 888), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)888_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 103 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 103 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 43 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)289_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (15, 1 - acm diverg, 289), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)289_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 975
Number of entity ID pairs that occurred more than once: 0
Analyse set of 975 weight vectors
Containing 166 true matches and 809 true non-matches
(17.03% true matches)
Identified 938 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 907 (96.70%)
2 : 28 (2.99%)
3 : 2 (0.21%)
6 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 938 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 149
0.000 : 789
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 975
Number of unique weight vectors: 938
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(938, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 938 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 938 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 851 weight vectors
Based on 27 matches and 60 non-matches
Classified 114 matches and 737 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(737, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 114 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 114 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 44 matches and 4 non-matches
Purity of oracle classification: 0.917
Entropy of oracle classification: 0.414
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(10)226_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 226), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)226_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 568
Number of entity ID pairs that occurred more than once: 0
Analyse set of 568 weight vectors
Containing 201 true matches and 367 true non-matches
(35.39% true matches)
Identified 535 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 521 (97.38%)
2 : 11 (2.06%)
3 : 2 (0.37%)
19 : 1 (0.19%)
Identified 1 non-pure unique weight vectors (from 535 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 168
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 366
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 567
Number of unique weight vectors: 535
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(535, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 535 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 535 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 25 matches and 56 non-matches
Purity of oracle classification: 0.691
Entropy of oracle classification: 0.892
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 454 weight vectors
Based on 25 matches and 56 non-matches
Classified 140 matches and 314 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.691358024691358, 0.8915996278279094, 0.30864197530864196)
(314, 0.691358024691358, 0.8915996278279094, 0.30864197530864196)
Current size of match and non-match training data sets: 25 / 56
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 140 weight vectors
- Estimated match proportion 0.309
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 140 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)386_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 386), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)386_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 24 matches and 64 non-matches
Purity of oracle classification: 0.727
Entropy of oracle classification: 0.845
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 24 matches and 64 non-matches
Classified 91 matches and 857 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(91, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
(857, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
Current size of match and non-match training data sets: 24 / 64
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.85
- Size 857 weight vectors
- Estimated match proportion 0.273
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 857 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 18 matches and 52 non-matches
Purity of oracle classification: 0.743
Entropy of oracle classification: 0.822
Number of true matches: 18
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)107_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 107), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)107_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 799
Number of entity ID pairs that occurred more than once: 0
Analyse set of 799 weight vectors
Containing 222 true matches and 577 true non-matches
(27.78% true matches)
Identified 745 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 708 (95.03%)
2 : 34 (4.56%)
3 : 2 (0.27%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 745 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 556
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 798
Number of unique weight vectors: 745
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(745, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 745 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 745 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 660 weight vectors
Based on 27 matches and 58 non-matches
Classified 147 matches and 513 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(147, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(513, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 513 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 513 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 9 matches and 63 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)227_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 227), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)227_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 541
Number of entity ID pairs that occurred more than once: 0
Analyse set of 541 weight vectors
Containing 220 true matches and 321 true non-matches
(40.67% true matches)
Identified 503 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 485 (96.42%)
2 : 15 (2.98%)
3 : 2 (0.40%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 503 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 318
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 540
Number of unique weight vectors: 503
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(503, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 503 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 503 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 32 matches and 48 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 423 weight vectors
Based on 32 matches and 48 non-matches
Classified 142 matches and 281 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6, 0.9709505944546686, 0.4)
(281, 0.6, 0.9709505944546686, 0.4)
Current size of match and non-match training data sets: 32 / 48
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 281 weight vectors
- Estimated match proportion 0.400
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 281 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.800, 0.636, 0.563, 0.545, 0.722] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 8 matches and 61 non-matches
Purity of oracle classification: 0.884
Entropy of oracle classification: 0.518
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)626_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 626), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)626_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 908
Number of entity ID pairs that occurred more than once: 0
Analyse set of 908 weight vectors
Containing 213 true matches and 695 true non-matches
(23.46% true matches)
Identified 853 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 817 (95.78%)
2 : 33 (3.87%)
3 : 2 (0.23%)
19 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 853 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 674
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 907
Number of unique weight vectors: 853
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(853, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 853 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 853 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 767 weight vectors
Based on 30 matches and 56 non-matches
Classified 199 matches and 568 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(199, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(568, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 199 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 61
Farthest first selection of 61 weight vectors from 199 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.619, 1.000, 0.103, 0.163, 0.129, 0.146, 0.213] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 61 weight vectors
The oracle will correctly classify 61 weight vectors and wrongly classify 0
Classified 40 matches and 21 non-matches
Purity of oracle classification: 0.656
Entropy of oracle classification: 0.929
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 21
Number of false non-matches: 0
Deleted 61 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)290_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 290), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)290_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1015
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1015 weight vectors
Containing 213 true matches and 802 true non-matches
(20.99% true matches)
Identified 963 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 928 (96.37%)
2 : 32 (3.32%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 963 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 781
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1014
Number of unique weight vectors: 963
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(963, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 963 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 963 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 876 weight vectors
Based on 26 matches and 61 non-matches
Classified 122 matches and 754 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(122, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(754, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 754 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 754 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 11 matches and 62 non-matches
Purity of oracle classification: 0.849
Entropy of oracle classification: 0.612
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)903_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 903), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)903_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 634
Number of entity ID pairs that occurred more than once: 0
Analyse set of 634 weight vectors
Containing 189 true matches and 445 true non-matches
(29.81% true matches)
Identified 613 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 599 (97.72%)
2 : 11 (1.79%)
3 : 2 (0.33%)
7 : 1 (0.16%)
Identified 0 non-pure unique weight vectors (from 613 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 168
0.000 : 445
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 634
Number of unique weight vectors: 613
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(613, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 613 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 613 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.364, 0.619, 0.471, 0.600, 0.533] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 29 matches and 54 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.934
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 530 weight vectors
Based on 29 matches and 54 non-matches
Classified 129 matches and 401 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(129, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
(401, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
Current size of match and non-match training data sets: 29 / 54
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 129 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 129 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)69_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985915
recall 0.234114
f-measure 0.378378
da 71
dm 0
ndm 0
tp 70
fp 1
tn 4.76529e+07
fn 229
Name: (15, 1 - acm diverg, 69), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)69_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 872
Number of entity ID pairs that occurred more than once: 0
Analyse set of 872 weight vectors
Containing 186 true matches and 686 true non-matches
(21.33% true matches)
Identified 832 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 798 (95.91%)
2 : 31 (3.73%)
3 : 2 (0.24%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 832 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.000 : 666
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 872
Number of unique weight vectors: 832
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(832, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 832 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 832 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 746 weight vectors
Based on 28 matches and 58 non-matches
Classified 148 matches and 598 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(598, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 148 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 148 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 45 matches and 9 non-matches
Purity of oracle classification: 0.833
Entropy of oracle classification: 0.650
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
71.0
Analisando o arquivo: diverg(10)359_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.980198
recall 0.331104
f-measure 0.495
da 101
dm 0
ndm 0
tp 99
fp 2
tn 4.76529e+07
fn 200
Name: (10, 1 - acm diverg, 359), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)359_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 463
Number of entity ID pairs that occurred more than once: 0
Analyse set of 463 weight vectors
Containing 149 true matches and 314 true non-matches
(32.18% true matches)
Identified 451 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 443 (98.23%)
2 : 5 (1.11%)
3 : 2 (0.44%)
4 : 1 (0.22%)
Identified 0 non-pure unique weight vectors (from 451 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 137
0.000 : 314
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 463
Number of unique weight vectors: 451
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(451, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 451 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 451 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 28 matches and 51 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.938
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 372 weight vectors
Based on 28 matches and 51 non-matches
Classified 112 matches and 260 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.6455696202531646, 0.9379626436434423, 0.35443037974683544)
(260, 0.6455696202531646, 0.9379626436434423, 0.35443037974683544)
Current size of match and non-match training data sets: 28 / 51
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 112 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 43 matches and 6 non-matches
Purity of oracle classification: 0.878
Entropy of oracle classification: 0.536
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
101.0
Analisando o arquivo: diverg(20)907_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 907), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)907_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 112 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 44 matches and 1 non-matches
Purity of oracle classification: 0.978
Entropy of oracle classification: 0.154
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)220_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 220), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)220_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 844
Number of entity ID pairs that occurred more than once: 0
Analyse set of 844 weight vectors
Containing 209 true matches and 635 true non-matches
(24.76% true matches)
Identified 797 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 762 (95.61%)
2 : 32 (4.02%)
3 : 2 (0.25%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 797 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 843
Number of unique weight vectors: 797
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(797, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 797 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 797 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 712 weight vectors
Based on 26 matches and 59 non-matches
Classified 123 matches and 589 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(589, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 589 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 589 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 16 matches and 56 non-matches
Purity of oracle classification: 0.778
Entropy of oracle classification: 0.764
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)244_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 244), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)244_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 0 matches and 829 non-matches
40.0
Analisando o arquivo: diverg(15)235_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (15, 1 - acm diverg, 235), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)235_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 724
Number of entity ID pairs that occurred more than once: 0
Analyse set of 724 weight vectors
Containing 212 true matches and 512 true non-matches
(29.28% true matches)
Identified 671 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 636 (94.78%)
2 : 32 (4.77%)
3 : 2 (0.30%)
18 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 671 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 491
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 723
Number of unique weight vectors: 671
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(671, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 671 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 671 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 587 weight vectors
Based on 27 matches and 57 non-matches
Classified 142 matches and 445 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(445, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 142 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(15)509_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 509), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)509_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1023
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1023 weight vectors
Containing 222 true matches and 801 true non-matches
(21.70% true matches)
Identified 969 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 932 (96.18%)
2 : 34 (3.51%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 969 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 780
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1022
Number of unique weight vectors: 969
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(969, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 969 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 969 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 882 weight vectors
Based on 31 matches and 56 non-matches
Classified 302 matches and 580 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(302, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(580, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 302 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 302 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.600, 1.000, 0.217, 0.132, 0.167, 0.125, 0.188] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 43 matches and 25 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)267_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 267), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)267_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1075
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1075 weight vectors
Containing 208 true matches and 867 true non-matches
(19.35% true matches)
Identified 1028 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 993 (96.60%)
2 : 32 (3.11%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1028 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1074
Number of unique weight vectors: 1028
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1028, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1028 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1028 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 940 weight vectors
Based on 25 matches and 63 non-matches
Classified 123 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 123 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)752_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990099
recall 0.334448
f-measure 0.5
da 101
dm 0
ndm 0
tp 100
fp 1
tn 4.76529e+07
fn 199
Name: (10, 1 - acm diverg, 752), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)752_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 779
Number of entity ID pairs that occurred more than once: 0
Analyse set of 779 weight vectors
Containing 165 true matches and 614 true non-matches
(21.18% true matches)
Identified 740 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 711 (96.08%)
2 : 26 (3.51%)
3 : 2 (0.27%)
10 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 740 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 146
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 593
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 778
Number of unique weight vectors: 740
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(740, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 740 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 740 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 655 weight vectors
Based on 29 matches and 56 non-matches
Classified 116 matches and 539 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(116, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(539, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 116 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 116 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.929, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 42 matches and 8 non-matches
Purity of oracle classification: 0.840
Entropy of oracle classification: 0.634
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
101.0
Analisando o arquivo: diverg(20)820_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 820), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)820_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)203_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 203), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)203_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1068
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1068 weight vectors
Containing 226 true matches and 842 true non-matches
(21.16% true matches)
Identified 1011 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 974 (96.34%)
2 : 34 (3.36%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1011 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 821
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1067
Number of unique weight vectors: 1011
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1011, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1011 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1011 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 924 weight vectors
Based on 24 matches and 63 non-matches
Classified 131 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(793, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 793 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 793 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 12 matches and 58 non-matches
Purity of oracle classification: 0.829
Entropy of oracle classification: 0.661
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)641_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.977273
recall 0.431438
f-measure 0.598608
da 132
dm 0
ndm 0
tp 129
fp 3
tn 4.76529e+07
fn 170
Name: (10, 1 - acm diverg, 641), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)641_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 771
Number of entity ID pairs that occurred more than once: 0
Analyse set of 771 weight vectors
Containing 107 true matches and 664 true non-matches
(13.88% true matches)
Identified 740 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 712 (96.22%)
2 : 25 (3.38%)
3 : 3 (0.41%)
Identified 0 non-pure unique weight vectors (from 740 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 96
0.000 : 644
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 771
Number of unique weight vectors: 740
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(740, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 740 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 740 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 655 weight vectors
Based on 27 matches and 58 non-matches
Classified 86 matches and 569 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(86, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(569, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 569 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 569 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.233, 0.545, 0.714, 0.455, 0.238] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.917, 0.000, 0.550, 0.455, 0.455, 0.000, 0.000] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.815, 0.643, 0.800, 0.750, 0.429] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.800, 0.000, 0.556, 0.182, 0.500, 0.071, 0.400] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 0 matches and 72 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
132.0
Analisando o arquivo: diverg(20)455_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 455), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)455_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 793 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 793 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.736, 1.000, 0.250, 0.290, 0.172, 0.188, 0.286] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 14 matches and 59 non-matches
Purity of oracle classification: 0.808
Entropy of oracle classification: 0.705
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)399_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 399), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)399_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 491
Number of entity ID pairs that occurred more than once: 0
Analyse set of 491 weight vectors
Containing 172 true matches and 319 true non-matches
(35.03% true matches)
Identified 473 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 461 (97.46%)
2 : 9 (1.90%)
3 : 2 (0.42%)
6 : 1 (0.21%)
Identified 0 non-pure unique weight vectors (from 473 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 154
0.000 : 319
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 491
Number of unique weight vectors: 473
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(473, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 473 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 473 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 22 matches and 58 non-matches
Purity of oracle classification: 0.725
Entropy of oracle classification: 0.849
Number of true matches: 22
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 393 weight vectors
Based on 22 matches and 58 non-matches
Classified 96 matches and 297 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(96, 0.725, 0.8485481782946158, 0.275)
(297, 0.725, 0.8485481782946158, 0.275)
Current size of match and non-match training data sets: 22 / 58
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 96 weight vectors
- Estimated match proportion 0.275
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 96 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.933, 1.000, 0.952, 1.000, 1.000, 0.944, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(10)85_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 85), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)85_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 664
Number of entity ID pairs that occurred more than once: 0
Analyse set of 664 weight vectors
Containing 200 true matches and 464 true non-matches
(30.12% true matches)
Identified 619 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 585 (94.51%)
2 : 31 (5.01%)
3 : 2 (0.32%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 619 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 443
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 663
Number of unique weight vectors: 619
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(619, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 619 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 619 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 536 weight vectors
Based on 28 matches and 55 non-matches
Classified 159 matches and 377 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(377, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 159 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 159 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 45 matches and 11 non-matches
Purity of oracle classification: 0.804
Entropy of oracle classification: 0.715
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 11
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)589_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 589), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)589_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 777
Number of entity ID pairs that occurred more than once: 0
Analyse set of 777 weight vectors
Containing 223 true matches and 554 true non-matches
(28.70% true matches)
Identified 723 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 686 (94.88%)
2 : 34 (4.70%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 723 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 533
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 776
Number of unique weight vectors: 723
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(723, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 723 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 723 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 638 weight vectors
Based on 27 matches and 58 non-matches
Classified 114 matches and 524 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(524, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 524 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 524 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.714, 0.727, 0.750, 0.294, 0.833] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 18 matches and 54 non-matches
Purity of oracle classification: 0.750
Entropy of oracle classification: 0.811
Number of true matches: 18
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)877_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 877), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)877_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 672
Number of entity ID pairs that occurred more than once: 0
Analyse set of 672 weight vectors
Containing 217 true matches and 455 true non-matches
(32.29% true matches)
Identified 639 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 623 (97.50%)
2 : 13 (2.03%)
3 : 2 (0.31%)
17 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 639 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 454
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 671
Number of unique weight vectors: 639
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(639, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 639 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 639 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 30 matches and 53 non-matches
Purity of oracle classification: 0.639
Entropy of oracle classification: 0.944
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 556 weight vectors
Based on 30 matches and 53 non-matches
Classified 154 matches and 402 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(154, 0.6385542168674698, 0.943876757128791, 0.3614457831325301)
(402, 0.6385542168674698, 0.943876757128791, 0.3614457831325301)
Current size of match and non-match training data sets: 30 / 53
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 402 weight vectors
- Estimated match proportion 0.361
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 402 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.556, 0.429, 0.500, 0.700, 0.643] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 4 matches and 69 non-matches
Purity of oracle classification: 0.945
Entropy of oracle classification: 0.306
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)240_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 240), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)240_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 804
Number of entity ID pairs that occurred more than once: 0
Analyse set of 804 weight vectors
Containing 188 true matches and 616 true non-matches
(23.38% true matches)
Identified 762 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 731 (95.93%)
2 : 28 (3.67%)
3 : 2 (0.26%)
11 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 762 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 595
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 803
Number of unique weight vectors: 762
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(762, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 762 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 762 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.722, 0.471, 0.545, 0.579] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.800, 0.000, 0.556, 0.182, 0.500, 0.071, 0.400] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.344, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.033, 0.300, 0.524, 0.727, 0.762] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 677 weight vectors
Based on 27 matches and 58 non-matches
Classified 130 matches and 547 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(130, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(547, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 130 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 130 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 46 matches and 5 non-matches
Purity of oracle classification: 0.902
Entropy of oracle classification: 0.463
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(10)314_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.977444
recall 0.434783
f-measure 0.601852
da 133
dm 0
ndm 0
tp 130
fp 3
tn 4.76529e+07
fn 169
Name: (10, 1 - acm diverg, 314), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)314_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 769
Number of entity ID pairs that occurred more than once: 0
Analyse set of 769 weight vectors
Containing 124 true matches and 645 true non-matches
(16.12% true matches)
Identified 738 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 710 (96.21%)
2 : 25 (3.39%)
3 : 3 (0.41%)
Identified 0 non-pure unique weight vectors (from 738 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 113
0.000 : 625
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 769
Number of unique weight vectors: 738
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(738, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 738 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 738 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 653 weight vectors
Based on 30 matches and 55 non-matches
Classified 103 matches and 550 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(550, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 103 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 103 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.947, 1.000, 0.292, 0.178, 0.227, 0.122, 0.154] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 37 matches and 11 non-matches
Purity of oracle classification: 0.771
Entropy of oracle classification: 0.777
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 11
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
133.0
Analisando o arquivo: diverg(20)307_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 307), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)307_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 789
Number of entity ID pairs that occurred more than once: 0
Analyse set of 789 weight vectors
Containing 225 true matches and 564 true non-matches
(28.52% true matches)
Identified 750 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 731 (97.47%)
2 : 16 (2.13%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 750 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 561
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 788
Number of unique weight vectors: 750
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(750, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 750 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 750 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 34 matches and 51 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 665 weight vectors
Based on 34 matches and 51 non-matches
Classified 153 matches and 512 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6, 0.9709505944546686, 0.4)
(512, 0.6, 0.9709505944546686, 0.4)
Current size of match and non-match training data sets: 34 / 51
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 153 weight vectors
- Estimated match proportion 0.400
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 153 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 50 matches and 8 non-matches
Purity of oracle classification: 0.862
Entropy of oracle classification: 0.579
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)861_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 861), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)861_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)752_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 752), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)752_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1067
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1067 weight vectors
Containing 221 true matches and 846 true non-matches
(20.71% true matches)
Identified 1011 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 975 (96.44%)
2 : 33 (3.26%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1011 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 825
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1066
Number of unique weight vectors: 1011
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1011, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1011 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1011 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 924 weight vectors
Based on 23 matches and 64 non-matches
Classified 106 matches and 818 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(106, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(818, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 106 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 106 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 44 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)337_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (10, 1 - acm diverg, 337), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)337_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 450
Number of entity ID pairs that occurred more than once: 0
Analyse set of 450 weight vectors
Containing 199 true matches and 251 true non-matches
(44.22% true matches)
Identified 418 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 402 (96.17%)
2 : 13 (3.11%)
3 : 2 (0.48%)
16 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 418 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 248
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 449
Number of unique weight vectors: 418
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(418, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 418 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 418 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 37 matches and 41 non-matches
Purity of oracle classification: 0.526
Entropy of oracle classification: 0.998
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 340 weight vectors
Based on 37 matches and 41 non-matches
Classified 133 matches and 207 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
(207, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
Current size of match and non-match training data sets: 37 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 133 weight vectors
- Estimated match proportion 0.474
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 133 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 50 matches and 6 non-matches
Purity of oracle classification: 0.893
Entropy of oracle classification: 0.491
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(20)882_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 882), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)882_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)425_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (10, 1 - acm diverg, 425), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)425_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 301
Number of entity ID pairs that occurred more than once: 0
Analyse set of 301 weight vectors
Containing 203 true matches and 98 true non-matches
(67.44% true matches)
Identified 270 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 257 (95.19%)
2 : 10 (3.70%)
3 : 2 (0.74%)
18 : 1 (0.37%)
Identified 1 non-pure unique weight vectors (from 270 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 97
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 300
Number of unique weight vectors: 270
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(270, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 270 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 71
Perform initial selection using "far" method
Farthest first selection of 71 weight vectors from 270 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 37 matches and 34 non-matches
Purity of oracle classification: 0.521
Entropy of oracle classification: 0.999
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 34
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 199 weight vectors
Based on 37 matches and 34 non-matches
Classified 141 matches and 58 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 71
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.5211267605633803, 0.9987117514654895, 0.5211267605633803)
(58, 0.5211267605633803, 0.9987117514654895, 0.5211267605633803)
Current size of match and non-match training data sets: 37 / 34
Selected cluster with (queue ordering: random):
- Purity 0.52 and entropy 1.00
- Size 141 weight vectors
- Estimated match proportion 0.521
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 141 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.950, 0.923, 0.941] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 48 matches and 9 non-matches
Purity of oracle classification: 0.842
Entropy of oracle classification: 0.629
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(10)933_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 933), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)933_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 274
Number of entity ID pairs that occurred more than once: 0
Analyse set of 274 weight vectors
Containing 171 true matches and 103 true non-matches
(62.41% true matches)
Identified 256 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 247 (96.48%)
2 : 6 (2.34%)
3 : 2 (0.78%)
9 : 1 (0.39%)
Identified 1 non-pure unique weight vectors (from 256 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 153
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 102
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 265
Number of unique weight vectors: 255
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(255, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 255 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 70
Perform initial selection using "far" method
Farthest first selection of 70 weight vectors from 255 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 35 matches and 35 non-matches
Purity of oracle classification: 0.500
Entropy of oracle classification: 1.000
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 35
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 185 weight vectors
Based on 35 matches and 35 non-matches
Classified 120 matches and 65 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 70
Size, purity, entropy, and estimated match proportion of clusters in queue:
(120, 0.5, 1.0, 0.5)
(65, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 35 / 35
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 120 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 120 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 47 matches and 7 non-matches
Purity of oracle classification: 0.870
Entropy of oracle classification: 0.556
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(15)136_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.99
recall 0.331104
f-measure 0.496241
da 100
dm 0
ndm 0
tp 99
fp 1
tn 4.76529e+07
fn 200
Name: (15, 1 - acm diverg, 136), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)136_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1039
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1039 weight vectors
Containing 167 true matches and 872 true non-matches
(16.07% true matches)
Identified 1000 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 971 (97.10%)
2 : 26 (2.60%)
3 : 2 (0.20%)
10 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1000 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 851
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1038
Number of unique weight vectors: 1000
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1000, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1000 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1000 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 913 weight vectors
Based on 23 matches and 64 non-matches
Classified 57 matches and 856 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(57, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(856, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 57 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 33
Farthest first selection of 33 weight vectors from 57 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 1.000, 1.000, 0.952, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[0.420, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
Perform oracle with 100.00 accuracy on 33 weight vectors
The oracle will correctly classify 33 weight vectors and wrongly classify 0
Classified 33 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 33 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(20)894_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 894), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)894_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1035
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1035 weight vectors
Containing 223 true matches and 812 true non-matches
(21.55% true matches)
Identified 981 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 944 (96.23%)
2 : 34 (3.47%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 981 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 791
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1034
Number of unique weight vectors: 981
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(981, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 981 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 981 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 894 weight vectors
Based on 28 matches and 59 non-matches
Classified 156 matches and 738 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(738, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 738 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 738 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 5 matches and 70 non-matches
Purity of oracle classification: 0.933
Entropy of oracle classification: 0.353
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)146_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 146), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)146_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 717
Number of entity ID pairs that occurred more than once: 0
Analyse set of 717 weight vectors
Containing 217 true matches and 500 true non-matches
(30.26% true matches)
Identified 681 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 662 (97.21%)
2 : 16 (2.35%)
3 : 2 (0.29%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 681 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 497
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 716
Number of unique weight vectors: 681
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(681, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 681 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 681 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 34 matches and 50 non-matches
Purity of oracle classification: 0.595
Entropy of oracle classification: 0.974
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 597 weight vectors
Based on 34 matches and 50 non-matches
Classified 288 matches and 309 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(288, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
(309, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
Current size of match and non-match training data sets: 34 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 288 weight vectors
- Estimated match proportion 0.405
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 288 vectors
The selected farthest weight vectors are:
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 45 matches and 25 non-matches
Purity of oracle classification: 0.643
Entropy of oracle classification: 0.940
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)469_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 469), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)469_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1069
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1069 weight vectors
Containing 221 true matches and 848 true non-matches
(20.67% true matches)
Identified 1013 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 977 (96.45%)
2 : 33 (3.26%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1013 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 827
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1068
Number of unique weight vectors: 1013
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1013, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1013 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1013 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 926 weight vectors
Based on 27 matches and 60 non-matches
Classified 142 matches and 784 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(784, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 784 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 784 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 8 matches and 66 non-matches
Purity of oracle classification: 0.892
Entropy of oracle classification: 0.494
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)775_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 775), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)775_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1005
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1005 weight vectors
Containing 195 true matches and 810 true non-matches
(19.40% true matches)
Identified 963 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 928 (96.37%)
2 : 32 (3.32%)
3 : 2 (0.21%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 963 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.000 : 790
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1005
Number of unique weight vectors: 963
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(963, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 963 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 963 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 876 weight vectors
Based on 28 matches and 59 non-matches
Classified 138 matches and 738 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(138, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(738, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 738 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 738 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 5 matches and 70 non-matches
Purity of oracle classification: 0.933
Entropy of oracle classification: 0.353
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(20)126_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 126), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)126_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1035
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1035 weight vectors
Containing 223 true matches and 812 true non-matches
(21.55% true matches)
Identified 981 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 944 (96.23%)
2 : 34 (3.47%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 981 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 791
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1034
Number of unique weight vectors: 981
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(981, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 981 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 981 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 894 weight vectors
Based on 28 matches and 59 non-matches
Classified 156 matches and 738 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(738, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 738 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 738 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 5 matches and 70 non-matches
Purity of oracle classification: 0.933
Entropy of oracle classification: 0.353
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)898_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.977444
recall 0.434783
f-measure 0.601852
da 133
dm 0
ndm 0
tp 130
fp 3
tn 4.76529e+07
fn 169
Name: (10, 1 - acm diverg, 898), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)898_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 598
Number of entity ID pairs that occurred more than once: 0
Analyse set of 598 weight vectors
Containing 131 true matches and 467 true non-matches
(21.91% true matches)
Identified 585 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 577 (98.63%)
2 : 5 (0.85%)
3 : 2 (0.34%)
5 : 1 (0.17%)
Identified 0 non-pure unique weight vectors (from 585 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 118
0.000 : 467
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 598
Number of unique weight vectors: 585
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(585, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 585 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 585 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 30 matches and 52 non-matches
Purity of oracle classification: 0.634
Entropy of oracle classification: 0.947
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 503 weight vectors
Based on 30 matches and 52 non-matches
Classified 76 matches and 427 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(76, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
(427, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
Current size of match and non-match training data sets: 30 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 427 weight vectors
- Estimated match proportion 0.366
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 427 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.409, 0.654, 0.500, 0.516, 0.333] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.500, 0.452, 0.632, 0.714, 0.667] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.786, 0.833, 0.545, 0.478, 0.346] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 8 matches and 66 non-matches
Purity of oracle classification: 0.892
Entropy of oracle classification: 0.494
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
133.0
Analisando o arquivo: diverg(20)834_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 834), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)834_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 131 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)403_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 403), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)403_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 24 matches and 64 non-matches
Purity of oracle classification: 0.727
Entropy of oracle classification: 0.845
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 24 matches and 64 non-matches
Classified 95 matches and 853 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(95, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
(853, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
Current size of match and non-match training data sets: 24 / 64
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.85
- Size 95 weight vectors
- Estimated match proportion 0.273
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 95 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)203_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 203), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)203_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1039
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1039 weight vectors
Containing 220 true matches and 819 true non-matches
(21.17% true matches)
Identified 983 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 947 (96.34%)
2 : 33 (3.36%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 983 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 798
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1038
Number of unique weight vectors: 983
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(983, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 983 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 983 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 896 weight vectors
Based on 32 matches and 55 non-matches
Classified 324 matches and 572 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(324, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(572, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 324 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 324 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 40 matches and 30 non-matches
Purity of oracle classification: 0.571
Entropy of oracle classification: 0.985
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 30
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)250_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 250), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)250_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 799
Number of entity ID pairs that occurred more than once: 0
Analyse set of 799 weight vectors
Containing 224 true matches and 575 true non-matches
(28.04% true matches)
Identified 760 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 741 (97.50%)
2 : 16 (2.11%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 760 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 572
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 798
Number of unique weight vectors: 760
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(760, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 760 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 760 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 675 weight vectors
Based on 29 matches and 56 non-matches
Classified 149 matches and 526 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(526, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 149 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 49 matches and 6 non-matches
Purity of oracle classification: 0.891
Entropy of oracle classification: 0.497
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)734_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 734), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)734_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 795
Number of entity ID pairs that occurred more than once: 0
Analyse set of 795 weight vectors
Containing 224 true matches and 571 true non-matches
(28.18% true matches)
Identified 756 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 737 (97.49%)
2 : 16 (2.12%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 756 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 568
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 794
Number of unique weight vectors: 756
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(756, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 756 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 756 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 671 weight vectors
Based on 29 matches and 56 non-matches
Classified 147 matches and 524 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(147, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(524, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 524 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 524 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.538, 0.613, 0.789, 0.227, 0.857] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 7 matches and 67 non-matches
Purity of oracle classification: 0.905
Entropy of oracle classification: 0.452
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)16_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 16), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)16_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1027
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1027 weight vectors
Containing 223 true matches and 804 true non-matches
(21.71% true matches)
Identified 973 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 936 (96.20%)
2 : 34 (3.49%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 973 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 783
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 973
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(973, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 973 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 973 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 886 weight vectors
Based on 26 matches and 61 non-matches
Classified 131 matches and 755 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(755, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 755 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 755 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 11 matches and 62 non-matches
Purity of oracle classification: 0.849
Entropy of oracle classification: 0.612
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)184_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 184), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)184_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 847
Number of entity ID pairs that occurred more than once: 0
Analyse set of 847 weight vectors
Containing 214 true matches and 633 true non-matches
(25.27% true matches)
Identified 793 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 758 (95.59%)
2 : 32 (4.04%)
3 : 2 (0.25%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 793 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 612
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 846
Number of unique weight vectors: 793
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(793, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 793 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 793 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 708 weight vectors
Based on 28 matches and 57 non-matches
Classified 145 matches and 563 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(563, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 145 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 145 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)982_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 982), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)982_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 489
Number of entity ID pairs that occurred more than once: 0
Analyse set of 489 weight vectors
Containing 192 true matches and 297 true non-matches
(39.26% true matches)
Identified 459 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 445 (96.95%)
2 : 11 (2.40%)
3 : 2 (0.44%)
16 : 1 (0.22%)
Identified 1 non-pure unique weight vectors (from 459 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 162
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 296
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 488
Number of unique weight vectors: 459
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(459, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 459 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 459 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.786, 0.833, 0.545, 0.478, 0.346] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.364, 0.619, 0.471, 0.600, 0.533] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 27 matches and 52 non-matches
Purity of oracle classification: 0.658
Entropy of oracle classification: 0.927
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 380 weight vectors
Based on 27 matches and 52 non-matches
Classified 144 matches and 236 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6582278481012658, 0.9265044456232998, 0.34177215189873417)
(236, 0.6582278481012658, 0.9265044456232998, 0.34177215189873417)
Current size of match and non-match training data sets: 27 / 52
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 236 weight vectors
- Estimated match proportion 0.342
Sample size for this cluster: 63
Farthest first selection of 63 weight vectors from 236 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.750, 0.905, 0.667, 0.500, 0.571] (False)
[1.000, 0.000, 0.579, 0.583, 0.522, 0.417, 0.563] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.813, 0.619, 0.333, 0.500, 0.571] (False)
[1.000, 0.000, 0.500, 0.452, 0.632, 0.714, 0.667] (False)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.522, 0.786, 0.800, 0.824, 0.667] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[1.000, 0.000, 0.923, 0.667, 0.667, 0.412, 0.571] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.565, 0.857, 0.833, 0.412, 0.667] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.750, 0.875, 0.545, 0.750, 0.571] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.786, 0.857, 0.667, 0.412, 0.857] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.600, 0.700, 0.600, 0.611, 0.706] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 63 weight vectors
The oracle will correctly classify 63 weight vectors and wrongly classify 0
Classified 0 matches and 63 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 63 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)271_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 271), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)271_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 754
Number of entity ID pairs that occurred more than once: 0
Analyse set of 754 weight vectors
Containing 222 true matches and 532 true non-matches
(29.44% true matches)
Identified 718 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 699 (97.35%)
2 : 16 (2.23%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 718 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 529
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 753
Number of unique weight vectors: 718
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(718, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 718 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 718 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 634 weight vectors
Based on 28 matches and 56 non-matches
Classified 135 matches and 499 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(499, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 499 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 499 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 13 matches and 60 non-matches
Purity of oracle classification: 0.822
Entropy of oracle classification: 0.676
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)732_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 732), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)732_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)440_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 440), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)440_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1064
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1064 weight vectors
Containing 209 true matches and 855 true non-matches
(19.64% true matches)
Identified 1017 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.56%)
2 : 32 (3.15%)
3 : 2 (0.20%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1017 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 834
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1063
Number of unique weight vectors: 1017
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1017, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1017 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1017 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 930 weight vectors
Based on 30 matches and 57 non-matches
Classified 232 matches and 698 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(232, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(698, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 698 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 698 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)654_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 654), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)654_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1075
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1075 weight vectors
Containing 208 true matches and 867 true non-matches
(19.35% true matches)
Identified 1028 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 993 (96.60%)
2 : 32 (3.11%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1028 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1074
Number of unique weight vectors: 1028
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1028, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1028 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1028 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 940 weight vectors
Based on 25 matches and 63 non-matches
Classified 123 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 817 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 817 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 11 matches and 60 non-matches
Purity of oracle classification: 0.845
Entropy of oracle classification: 0.622
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)694_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 694), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)694_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1026
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1026 weight vectors
Containing 198 true matches and 828 true non-matches
(19.30% true matches)
Identified 984 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 949 (96.44%)
2 : 32 (3.25%)
3 : 2 (0.20%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 984 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 808
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 984
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(984, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 984 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 984 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 897 weight vectors
Based on 26 matches and 61 non-matches
Classified 93 matches and 804 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(93, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(804, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 804 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 804 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 14 matches and 59 non-matches
Purity of oracle classification: 0.808
Entropy of oracle classification: 0.705
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)305_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 305), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)305_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)684_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 684), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)684_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 256
Number of entity ID pairs that occurred more than once: 0
Analyse set of 256 weight vectors
Containing 209 true matches and 47 true non-matches
(81.64% true matches)
Identified 225 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 210 (93.33%)
2 : 12 (5.33%)
3 : 2 (0.89%)
16 : 1 (0.44%)
Identified 1 non-pure unique weight vectors (from 225 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 46
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 255
Number of unique weight vectors: 225
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(225, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 225 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 67
Perform initial selection using "far" method
Farthest first selection of 67 weight vectors from 225 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 40 matches and 27 non-matches
Purity of oracle classification: 0.597
Entropy of oracle classification: 0.973
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 27
Number of false non-matches: 0
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 158 weight vectors
Based on 40 matches and 27 non-matches
Classified 158 matches and 0 non-matches
43.0
Analisando o arquivo: diverg(20)35_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 35), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)35_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1064
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1064 weight vectors
Containing 209 true matches and 855 true non-matches
(19.64% true matches)
Identified 1017 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.56%)
2 : 32 (3.15%)
3 : 2 (0.20%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1017 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 834
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1063
Number of unique weight vectors: 1017
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1017, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1017 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1017 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 930 weight vectors
Based on 27 matches and 60 non-matches
Classified 139 matches and 791 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(139, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(791, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 139 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 139 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)295_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 295), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)295_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)976_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 976), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)976_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 770
Number of entity ID pairs that occurred more than once: 0
Analyse set of 770 weight vectors
Containing 207 true matches and 563 true non-matches
(26.88% true matches)
Identified 741 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 724 (97.71%)
2 : 14 (1.89%)
3 : 2 (0.27%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 741 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 560
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 769
Number of unique weight vectors: 741
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(741, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 741 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 741 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 35 matches and 50 non-matches
Purity of oracle classification: 0.588
Entropy of oracle classification: 0.977
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 656 weight vectors
Based on 35 matches and 50 non-matches
Classified 152 matches and 504 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(152, 0.5882352941176471, 0.9774178175281716, 0.4117647058823529)
(504, 0.5882352941176471, 0.9774178175281716, 0.4117647058823529)
Current size of match and non-match training data sets: 35 / 50
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 152 weight vectors
- Estimated match proportion 0.412
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 152 vectors
The selected farthest weight vectors are:
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 49 matches and 9 non-matches
Purity of oracle classification: 0.845
Entropy of oracle classification: 0.623
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)640_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 640), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)640_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1043
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1043 weight vectors
Containing 222 true matches and 821 true non-matches
(21.28% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 952 (96.26%)
2 : 34 (3.44%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 800
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1042
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 26 matches and 61 non-matches
Classified 145 matches and 757 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(757, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 145 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 145 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)381_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 381), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)381_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 27 matches and 60 non-matches
Classified 148 matches and 784 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(784, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 784 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 784 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 8 matches and 66 non-matches
Purity of oracle classification: 0.892
Entropy of oracle classification: 0.494
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)383_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 383), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)383_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 797
Number of entity ID pairs that occurred more than once: 0
Analyse set of 797 weight vectors
Containing 225 true matches and 572 true non-matches
(28.23% true matches)
Identified 740 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 703 (95.00%)
2 : 34 (4.59%)
3 : 2 (0.27%)
20 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 740 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 551
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 796
Number of unique weight vectors: 740
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(740, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 740 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 740 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 655 weight vectors
Based on 29 matches and 56 non-matches
Classified 329 matches and 326 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(329, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(326, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 329 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 329 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 42 matches and 26 non-matches
Purity of oracle classification: 0.618
Entropy of oracle classification: 0.960
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)607_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 607), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)607_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 476
Number of entity ID pairs that occurred more than once: 0
Analyse set of 476 weight vectors
Containing 212 true matches and 264 true non-matches
(44.54% true matches)
Identified 442 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 425 (96.15%)
2 : 14 (3.17%)
3 : 2 (0.45%)
17 : 1 (0.23%)
Identified 1 non-pure unique weight vectors (from 442 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 261
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 475
Number of unique weight vectors: 442
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(442, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 442 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 442 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 36 matches and 43 non-matches
Purity of oracle classification: 0.544
Entropy of oracle classification: 0.994
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 363 weight vectors
Based on 36 matches and 43 non-matches
Classified 140 matches and 223 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.5443037974683544, 0.9943290455933882, 0.45569620253164556)
(223, 0.5443037974683544, 0.9943290455933882, 0.45569620253164556)
Current size of match and non-match training data sets: 36 / 43
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 0.99
- Size 223 weight vectors
- Estimated match proportion 0.456
Sample size for this cluster: 67
Farthest first selection of 67 weight vectors from 223 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 5 matches and 62 non-matches
Purity of oracle classification: 0.925
Entropy of oracle classification: 0.383
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)618_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 618), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)618_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 927
Number of entity ID pairs that occurred more than once: 0
Analyse set of 927 weight vectors
Containing 218 true matches and 709 true non-matches
(23.52% true matches)
Identified 872 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 836 (95.87%)
2 : 33 (3.78%)
3 : 2 (0.23%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 872 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 688
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 926
Number of unique weight vectors: 872
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(872, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 872 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 872 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 786 weight vectors
Based on 28 matches and 58 non-matches
Classified 157 matches and 629 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(157, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(629, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 629 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 629 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.667, 0.538, 0.455, 0.581, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 3 matches and 71 non-matches
Purity of oracle classification: 0.959
Entropy of oracle classification: 0.245
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)150_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 150), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)150_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 393
Number of entity ID pairs that occurred more than once: 0
Analyse set of 393 weight vectors
Containing 213 true matches and 180 true non-matches
(54.20% true matches)
Identified 356 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 338 (94.94%)
2 : 15 (4.21%)
3 : 2 (0.56%)
19 : 1 (0.28%)
Identified 1 non-pure unique weight vectors (from 356 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 177
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 392
Number of unique weight vectors: 356
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(356, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 356 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 356 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 31 matches and 45 non-matches
Purity of oracle classification: 0.592
Entropy of oracle classification: 0.975
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 280 weight vectors
Based on 31 matches and 45 non-matches
Classified 152 matches and 128 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(152, 0.5921052631578947, 0.9753817903274212, 0.40789473684210525)
(128, 0.5921052631578947, 0.9753817903274212, 0.40789473684210525)
Current size of match and non-match training data sets: 31 / 45
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 152 weight vectors
- Estimated match proportion 0.408
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 152 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 51 matches and 7 non-matches
Purity of oracle classification: 0.879
Entropy of oracle classification: 0.531
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)204_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (20, 1 - acm diverg, 204), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)204_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 963
Number of entity ID pairs that occurred more than once: 0
Analyse set of 963 weight vectors
Containing 212 true matches and 751 true non-matches
(22.01% true matches)
Identified 910 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 875 (96.15%)
2 : 32 (3.52%)
3 : 2 (0.22%)
18 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 910 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 730
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 962
Number of unique weight vectors: 910
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(910, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 910 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 910 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 823 weight vectors
Based on 24 matches and 63 non-matches
Classified 0 matches and 823 non-matches
48.0
Analisando o arquivo: diverg(15)955_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (15, 1 - acm diverg, 955), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)955_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 894
Number of entity ID pairs that occurred more than once: 0
Analyse set of 894 weight vectors
Containing 190 true matches and 704 true non-matches
(21.25% true matches)
Identified 854 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 820 (96.02%)
2 : 31 (3.63%)
3 : 2 (0.23%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 854 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.000 : 684
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 894
Number of unique weight vectors: 854
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(854, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 854 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 854 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 27 matches and 59 non-matches
Purity of oracle classification: 0.686
Entropy of oracle classification: 0.898
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 768 weight vectors
Based on 27 matches and 59 non-matches
Classified 117 matches and 651 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(117, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
(651, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
Current size of match and non-match training data sets: 27 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 651 weight vectors
- Estimated match proportion 0.314
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 651 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 11 matches and 62 non-matches
Purity of oracle classification: 0.849
Entropy of oracle classification: 0.612
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)105_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 105), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)105_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 30 matches and 58 non-matches
Classified 179 matches and 760 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(179, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(760, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 760 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 760 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)984_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.977444
recall 0.434783
f-measure 0.601852
da 133
dm 0
ndm 0
tp 130
fp 3
tn 4.76529e+07
fn 169
Name: (10, 1 - acm diverg, 984), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)984_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 256
Number of entity ID pairs that occurred more than once: 0
Analyse set of 256 weight vectors
Containing 125 true matches and 131 true non-matches
(48.83% true matches)
Identified 243 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 233 (95.88%)
2 : 7 (2.88%)
3 : 3 (1.23%)
Identified 0 non-pure unique weight vectors (from 243 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 114
0.000 : 129
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 256
Number of unique weight vectors: 243
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(243, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 243 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 69
Perform initial selection using "far" method
Farthest first selection of 69 weight vectors from 243 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 30 matches and 39 non-matches
Purity of oracle classification: 0.565
Entropy of oracle classification: 0.988
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 174 weight vectors
Based on 30 matches and 39 non-matches
Classified 85 matches and 89 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 69
Size, purity, entropy, and estimated match proportion of clusters in queue:
(85, 0.5652173913043478, 0.9876925088958034, 0.43478260869565216)
(89, 0.5652173913043478, 0.9876925088958034, 0.43478260869565216)
Current size of match and non-match training data sets: 30 / 39
Selected cluster with (queue ordering: random):
- Purity 0.57 and entropy 0.99
- Size 89 weight vectors
- Estimated match proportion 0.435
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 89 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 2 matches and 44 non-matches
Purity of oracle classification: 0.957
Entropy of oracle classification: 0.258
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
133.0
Analisando o arquivo: diverg(20)344_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (20, 1 - acm diverg, 344), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)344_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 848
Number of entity ID pairs that occurred more than once: 0
Analyse set of 848 weight vectors
Containing 214 true matches and 634 true non-matches
(25.24% true matches)
Identified 794 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 759 (95.59%)
2 : 32 (4.03%)
3 : 2 (0.25%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 794 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 847
Number of unique weight vectors: 794
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(794, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 794 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 794 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 709 weight vectors
Based on 28 matches and 57 non-matches
Classified 145 matches and 564 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(564, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 145 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 145 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)189_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 189), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)189_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)315_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (10, 1 - acm diverg, 315), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)315_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 346
Number of entity ID pairs that occurred more than once: 0
Analyse set of 346 weight vectors
Containing 164 true matches and 182 true non-matches
(47.40% true matches)
Identified 330 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 320 (96.97%)
2 : 7 (2.12%)
3 : 2 (0.61%)
6 : 1 (0.30%)
Identified 0 non-pure unique weight vectors (from 330 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.000 : 182
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 346
Number of unique weight vectors: 330
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(330, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 330 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 74
Perform initial selection using "far" method
Farthest first selection of 74 weight vectors from 330 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 39 matches and 35 non-matches
Purity of oracle classification: 0.527
Entropy of oracle classification: 0.998
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 35
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 256 weight vectors
Based on 39 matches and 35 non-matches
Classified 115 matches and 141 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 74
Size, purity, entropy, and estimated match proportion of clusters in queue:
(115, 0.527027027027027, 0.9978913098356863, 0.527027027027027)
(141, 0.527027027027027, 0.9978913098356863, 0.527027027027027)
Current size of match and non-match training data sets: 39 / 35
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 115 weight vectors
- Estimated match proportion 0.527
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 115 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 42 matches and 10 non-matches
Purity of oracle classification: 0.808
Entropy of oracle classification: 0.706
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 10
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(20)626_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 626), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)626_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 799
Number of entity ID pairs that occurred more than once: 0
Analyse set of 799 weight vectors
Containing 224 true matches and 575 true non-matches
(28.04% true matches)
Identified 760 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 741 (97.50%)
2 : 16 (2.11%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 760 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 572
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 798
Number of unique weight vectors: 760
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(760, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 760 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 760 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 675 weight vectors
Based on 30 matches and 55 non-matches
Classified 149 matches and 526 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(526, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 526 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 526 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.433, 0.667, 0.500, 0.636, 0.421] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 4 matches and 71 non-matches
Purity of oracle classification: 0.947
Entropy of oracle classification: 0.300
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)48_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 48), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)48_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 689
Number of entity ID pairs that occurred more than once: 0
Analyse set of 689 weight vectors
Containing 219 true matches and 470 true non-matches
(31.79% true matches)
Identified 656 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 640 (97.56%)
2 : 13 (1.98%)
3 : 2 (0.30%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 656 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 469
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 688
Number of unique weight vectors: 656
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(656, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 656 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 656 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 572 weight vectors
Based on 29 matches and 55 non-matches
Classified 145 matches and 427 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(427, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 145 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 145 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)179_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 179), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)179_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 28 matches and 57 non-matches
Classified 141 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 141 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 141 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)786_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 786), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)786_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 460
Number of entity ID pairs that occurred more than once: 0
Analyse set of 460 weight vectors
Containing 210 true matches and 250 true non-matches
(45.65% true matches)
Identified 426 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 409 (96.01%)
2 : 14 (3.29%)
3 : 2 (0.47%)
17 : 1 (0.23%)
Identified 1 non-pure unique weight vectors (from 426 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 247
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 459
Number of unique weight vectors: 426
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(426, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 426 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 426 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 37 matches and 41 non-matches
Purity of oracle classification: 0.526
Entropy of oracle classification: 0.998
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 348 weight vectors
Based on 37 matches and 41 non-matches
Classified 246 matches and 102 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(246, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
(102, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
Current size of match and non-match training data sets: 37 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 246 weight vectors
- Estimated match proportion 0.474
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 246 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.902, 1.000, 0.182, 0.071, 0.182, 0.222, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 43 matches and 26 non-matches
Purity of oracle classification: 0.623
Entropy of oracle classification: 0.956
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)624_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 624), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)624_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 997
Number of entity ID pairs that occurred more than once: 0
Analyse set of 997 weight vectors
Containing 222 true matches and 775 true non-matches
(22.27% true matches)
Identified 943 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 906 (96.08%)
2 : 34 (3.61%)
3 : 2 (0.21%)
17 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 943 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 754
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 996
Number of unique weight vectors: 943
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(943, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 943 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 943 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 856 weight vectors
Based on 32 matches and 55 non-matches
Classified 302 matches and 554 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(302, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(554, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 302 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 302 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 44 matches and 25 non-matches
Purity of oracle classification: 0.638
Entropy of oracle classification: 0.945
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)655_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 655), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)655_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1043
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1043 weight vectors
Containing 222 true matches and 821 true non-matches
(21.28% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 952 (96.26%)
2 : 34 (3.44%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 800
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1042
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 26 matches and 61 non-matches
Classified 145 matches and 757 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(757, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 145 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 145 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)143_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 143), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)143_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 881
Number of entity ID pairs that occurred more than once: 0
Analyse set of 881 weight vectors
Containing 212 true matches and 669 true non-matches
(24.06% true matches)
Identified 829 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 793 (95.66%)
2 : 33 (3.98%)
3 : 2 (0.24%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 829 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 648
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 880
Number of unique weight vectors: 829
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(829, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 829 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 829 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 743 weight vectors
Based on 29 matches and 57 non-matches
Classified 162 matches and 581 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(581, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 581 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 581 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 0 matches and 75 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 75
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)407_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 407), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)407_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 996
Number of entity ID pairs that occurred more than once: 0
Analyse set of 996 weight vectors
Containing 221 true matches and 775 true non-matches
(22.19% true matches)
Identified 942 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 905 (96.07%)
2 : 34 (3.61%)
3 : 2 (0.21%)
17 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 942 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 754
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 995
Number of unique weight vectors: 942
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(942, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 942 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 942 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 855 weight vectors
Based on 31 matches and 56 non-matches
Classified 301 matches and 554 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(301, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(554, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 301 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 301 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.600, 1.000, 0.217, 0.132, 0.167, 0.125, 0.188] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 42 matches and 26 non-matches
Purity of oracle classification: 0.618
Entropy of oracle classification: 0.960
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)173_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 173), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)173_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 443
Number of entity ID pairs that occurred more than once: 0
Analyse set of 443 weight vectors
Containing 205 true matches and 238 true non-matches
(46.28% true matches)
Identified 417 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 403 (96.64%)
2 : 11 (2.64%)
3 : 2 (0.48%)
12 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 417 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 237
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 442
Number of unique weight vectors: 417
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(417, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 417 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 417 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 35 matches and 43 non-matches
Purity of oracle classification: 0.551
Entropy of oracle classification: 0.992
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 339 weight vectors
Based on 35 matches and 43 non-matches
Classified 139 matches and 200 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(139, 0.5512820512820513, 0.9923985003332222, 0.44871794871794873)
(200, 0.5512820512820513, 0.9923985003332222, 0.44871794871794873)
Current size of match and non-match training data sets: 35 / 43
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 200 weight vectors
- Estimated match proportion 0.449
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 200 vectors
The selected farthest weight vectors are:
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 9 matches and 55 non-matches
Purity of oracle classification: 0.859
Entropy of oracle classification: 0.586
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)929_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 929), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)929_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 814
Number of entity ID pairs that occurred more than once: 0
Analyse set of 814 weight vectors
Containing 227 true matches and 587 true non-matches
(27.89% true matches)
Identified 757 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 720 (95.11%)
2 : 34 (4.49%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 757 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 566
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 813
Number of unique weight vectors: 757
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(757, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 757 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 757 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 672 weight vectors
Based on 29 matches and 56 non-matches
Classified 160 matches and 512 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(160, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(512, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 160 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 160 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)419_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 419), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)419_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 589
Number of entity ID pairs that occurred more than once: 0
Analyse set of 589 weight vectors
Containing 206 true matches and 383 true non-matches
(34.97% true matches)
Identified 555 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 537 (96.76%)
2 : 15 (2.70%)
3 : 2 (0.36%)
16 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 555 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 380
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 588
Number of unique weight vectors: 555
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(555, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 555 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 555 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 30 matches and 52 non-matches
Purity of oracle classification: 0.634
Entropy of oracle classification: 0.947
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 473 weight vectors
Based on 30 matches and 52 non-matches
Classified 148 matches and 325 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
(325, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
Current size of match and non-match training data sets: 30 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 148 weight vectors
- Estimated match proportion 0.366
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 148 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 50 matches and 6 non-matches
Purity of oracle classification: 0.893
Entropy of oracle classification: 0.491
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(15)681_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 681), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)681_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 118 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 118 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)274_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 274), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)274_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 637
Number of entity ID pairs that occurred more than once: 0
Analyse set of 637 weight vectors
Containing 195 true matches and 442 true non-matches
(30.61% true matches)
Identified 610 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 594 (97.38%)
2 : 13 (2.13%)
3 : 2 (0.33%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 610 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 439
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 636
Number of unique weight vectors: 610
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(610, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 610 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 610 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 29 matches and 54 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.934
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 527 weight vectors
Based on 29 matches and 54 non-matches
Classified 144 matches and 383 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
(383, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
Current size of match and non-match training data sets: 29 / 54
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 383 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 383 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.870, 0.619, 0.643, 0.700, 0.524] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.538, 0.613, 0.789, 0.227, 0.857] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 4 matches and 67 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.313
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)332_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 332), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)332_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.429, 0.786, 0.750, 0.389, 0.857] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 30 matches and 55 non-matches
Classified 147 matches and 537 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(147, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(537, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 147 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 147 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 53 matches and 2 non-matches
Purity of oracle classification: 0.964
Entropy of oracle classification: 0.225
Number of true matches: 53
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)675_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (20, 1 - acm diverg, 675), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)675_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 920
Number of entity ID pairs that occurred more than once: 0
Analyse set of 920 weight vectors
Containing 215 true matches and 705 true non-matches
(23.37% true matches)
Identified 868 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 832 (95.85%)
2 : 33 (3.80%)
3 : 2 (0.23%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 868 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 684
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 919
Number of unique weight vectors: 868
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(868, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 868 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 868 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 782 weight vectors
Based on 28 matches and 58 non-matches
Classified 158 matches and 624 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(158, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(624, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 158 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 158 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)4_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 4), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)4_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 991
Number of entity ID pairs that occurred more than once: 0
Analyse set of 991 weight vectors
Containing 194 true matches and 797 true non-matches
(19.58% true matches)
Identified 949 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 914 (96.31%)
2 : 32 (3.37%)
3 : 2 (0.21%)
7 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 949 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.000 : 777
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 991
Number of unique weight vectors: 949
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(949, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 949 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 949 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 862 weight vectors
Based on 30 matches and 57 non-matches
Classified 287 matches and 575 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(287, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(575, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 575 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 575 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 0 matches and 75 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 75
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(15)539_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 539), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)539_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 953
Number of entity ID pairs that occurred more than once: 0
Analyse set of 953 weight vectors
Containing 201 true matches and 752 true non-matches
(21.09% true matches)
Identified 908 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 874 (96.26%)
2 : 31 (3.41%)
3 : 2 (0.22%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 908 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 952
Number of unique weight vectors: 908
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(908, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 908 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 908 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 25 matches and 62 non-matches
Purity of oracle classification: 0.713
Entropy of oracle classification: 0.865
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 821 weight vectors
Based on 25 matches and 62 non-matches
Classified 110 matches and 711 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(110, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
(711, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 25 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 711 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 711 vectors
The selected farthest weight vectors are:
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 13 matches and 58 non-matches
Purity of oracle classification: 0.817
Entropy of oracle classification: 0.687
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)577_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (20, 1 - acm diverg, 577), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)577_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1036
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1036 weight vectors
Containing 188 true matches and 848 true non-matches
(18.15% true matches)
Identified 994 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 963 (96.88%)
2 : 28 (2.82%)
3 : 2 (0.20%)
11 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 994 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 827
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1035
Number of unique weight vectors: 994
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(994, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 994 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 994 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 907 weight vectors
Based on 23 matches and 64 non-matches
Classified 77 matches and 830 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(77, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(830, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 77 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 38
Farthest first selection of 38 weight vectors from 77 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
Perform oracle with 100.00 accuracy on 38 weight vectors
The oracle will correctly classify 38 weight vectors and wrongly classify 0
Classified 38 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 38 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(15)665_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 665), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)665_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 29 matches and 59 non-matches
Purity of oracle classification: 0.670
Entropy of oracle classification: 0.914
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 29 matches and 59 non-matches
Classified 162 matches and 777 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
(777, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
Current size of match and non-match training data sets: 29 / 59
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 162 weight vectors
- Estimated match proportion 0.330
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 162 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)660_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 660), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)660_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 346
Number of entity ID pairs that occurred more than once: 0
Analyse set of 346 weight vectors
Containing 212 true matches and 134 true non-matches
(61.27% true matches)
Identified 312 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 294 (94.23%)
2 : 15 (4.81%)
3 : 2 (0.64%)
16 : 1 (0.32%)
Identified 1 non-pure unique weight vectors (from 312 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 131
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 345
Number of unique weight vectors: 312
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(312, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 312 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 73
Perform initial selection using "far" method
Farthest first selection of 73 weight vectors from 312 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 34 matches and 39 non-matches
Purity of oracle classification: 0.534
Entropy of oracle classification: 0.997
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 239 weight vectors
Based on 34 matches and 39 non-matches
Classified 150 matches and 89 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 73
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.5342465753424658, 0.9966132830150964, 0.4657534246575342)
(89, 0.5342465753424658, 0.9966132830150964, 0.4657534246575342)
Current size of match and non-match training data sets: 34 / 39
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 89 weight vectors
- Estimated match proportion 0.466
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 89 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.800, 0.636, 0.563, 0.545, 0.722] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 2 matches and 44 non-matches
Purity of oracle classification: 0.957
Entropy of oracle classification: 0.258
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)864_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987952
recall 0.274247
f-measure 0.429319
da 83
dm 0
ndm 0
tp 82
fp 1
tn 4.76529e+07
fn 217
Name: (10, 1 - acm diverg, 864), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)864_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 502
Number of entity ID pairs that occurred more than once: 0
Analyse set of 502 weight vectors
Containing 154 true matches and 348 true non-matches
(30.68% true matches)
Identified 485 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 477 (98.35%)
2 : 5 (1.03%)
3 : 2 (0.41%)
9 : 1 (0.21%)
Identified 1 non-pure unique weight vectors (from 485 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 137
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 347
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 493
Number of unique weight vectors: 484
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(484, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 484 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 484 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.714, 0.353, 0.583, 0.571] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 27 matches and 53 non-matches
Purity of oracle classification: 0.662
Entropy of oracle classification: 0.922
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 404 weight vectors
Based on 27 matches and 53 non-matches
Classified 114 matches and 290 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(114, 0.6625, 0.9224062617590723, 0.3375)
(290, 0.6625, 0.9224062617590723, 0.3375)
Current size of match and non-match training data sets: 27 / 53
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 290 weight vectors
- Estimated match proportion 0.338
Sample size for this cluster: 66
Farthest first selection of 66 weight vectors from 290 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.500, 0.452, 0.632, 0.714, 0.667] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 0.000, 0.750, 0.667, 0.444, 0.765, 0.714] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.667, 0.000, 0.800, 0.684, 0.667, 0.529, 0.609] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.481, 0.643, 0.667, 0.350, 0.643] (False)
Perform oracle with 100.00 accuracy on 66 weight vectors
The oracle will correctly classify 66 weight vectors and wrongly classify 0
Classified 1 matches and 65 non-matches
Purity of oracle classification: 0.985
Entropy of oracle classification: 0.113
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 66 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
83.0
Analisando o arquivo: diverg(20)89_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 89), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)89_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)622_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 622), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)622_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1041
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1041 weight vectors
Containing 222 true matches and 819 true non-matches
(21.33% true matches)
Identified 987 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 950 (96.25%)
2 : 34 (3.44%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 987 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 798
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1040
Number of unique weight vectors: 987
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(987, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 987 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 987 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 900 weight vectors
Based on 26 matches and 61 non-matches
Classified 144 matches and 756 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(756, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 144 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 144 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)300_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 300), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)300_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 28 matches and 57 non-matches
Classified 141 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 141 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 141 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)263_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 263), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)263_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 880
Number of entity ID pairs that occurred more than once: 0
Analyse set of 880 weight vectors
Containing 208 true matches and 672 true non-matches
(23.64% true matches)
Identified 828 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 792 (95.65%)
2 : 33 (3.99%)
3 : 2 (0.24%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 828 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 651
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 879
Number of unique weight vectors: 828
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(828, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 828 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 828 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 32 matches and 54 non-matches
Purity of oracle classification: 0.628
Entropy of oracle classification: 0.952
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 742 weight vectors
Based on 32 matches and 54 non-matches
Classified 168 matches and 574 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(168, 0.627906976744186, 0.9522656254366642, 0.37209302325581395)
(574, 0.627906976744186, 0.9522656254366642, 0.37209302325581395)
Current size of match and non-match training data sets: 32 / 54
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 168 weight vectors
- Estimated match proportion 0.372
Sample size for this cluster: 59
Farthest first selection of 59 weight vectors from 168 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 59 weight vectors
The oracle will correctly classify 59 weight vectors and wrongly classify 0
Classified 47 matches and 12 non-matches
Purity of oracle classification: 0.797
Entropy of oracle classification: 0.729
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 12
Number of false non-matches: 0
Deleted 59 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)285_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 285), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)285_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 831
Number of entity ID pairs that occurred more than once: 0
Analyse set of 831 weight vectors
Containing 227 true matches and 604 true non-matches
(27.32% true matches)
Identified 774 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 737 (95.22%)
2 : 34 (4.39%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 774 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 583
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 830
Number of unique weight vectors: 774
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(774, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 774 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 774 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 689 weight vectors
Based on 28 matches and 57 non-matches
Classified 151 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(538, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 151 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 51 matches and 3 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.310
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)280_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (15, 1 - acm diverg, 280), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)280_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 397
Number of entity ID pairs that occurred more than once: 0
Analyse set of 397 weight vectors
Containing 210 true matches and 187 true non-matches
(52.90% true matches)
Identified 362 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 345 (95.30%)
2 : 14 (3.87%)
3 : 2 (0.55%)
18 : 1 (0.28%)
Identified 1 non-pure unique weight vectors (from 362 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 184
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 396
Number of unique weight vectors: 362
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(362, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 362 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 362 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 30 matches and 46 non-matches
Purity of oracle classification: 0.605
Entropy of oracle classification: 0.968
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 46
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 286 weight vectors
Based on 30 matches and 46 non-matches
Classified 140 matches and 146 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6052631578947368, 0.9677884628267679, 0.39473684210526316)
(146, 0.6052631578947368, 0.9677884628267679, 0.39473684210526316)
Current size of match and non-match training data sets: 30 / 46
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.97
- Size 140 weight vectors
- Estimated match proportion 0.395
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 140 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 53 matches and 3 non-matches
Purity of oracle classification: 0.946
Entropy of oracle classification: 0.301
Number of true matches: 53
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(15)938_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 938), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)938_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1051
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1051 weight vectors
Containing 223 true matches and 828 true non-matches
(21.22% true matches)
Identified 997 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 960 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 997 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 807
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1050
Number of unique weight vectors: 997
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(997, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 997 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 997 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 910 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 792 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(792, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 792 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 792 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.736, 1.000, 0.250, 0.290, 0.172, 0.188, 0.286] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 14 matches and 59 non-matches
Purity of oracle classification: 0.808
Entropy of oracle classification: 0.705
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)376_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.982143
recall 0.183946
f-measure 0.309859
da 56
dm 0
ndm 0
tp 55
fp 1
tn 4.76529e+07
fn 244
Name: (10, 1 - acm diverg, 376), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)376_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 446
Number of entity ID pairs that occurred more than once: 0
Analyse set of 446 weight vectors
Containing 205 true matches and 241 true non-matches
(45.96% true matches)
Identified 413 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 399 (96.61%)
2 : 11 (2.66%)
3 : 2 (0.48%)
19 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 413 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 240
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 445
Number of unique weight vectors: 413
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(413, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 413 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 413 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 34 matches and 44 non-matches
Purity of oracle classification: 0.564
Entropy of oracle classification: 0.988
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 335 weight vectors
Based on 34 matches and 44 non-matches
Classified 133 matches and 202 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.5641025641025641, 0.9881108365218301, 0.4358974358974359)
(202, 0.5641025641025641, 0.9881108365218301, 0.4358974358974359)
Current size of match and non-match training data sets: 34 / 44
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 202 weight vectors
- Estimated match proportion 0.436
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 202 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.929, 1.000, 0.182, 0.238, 0.188, 0.146, 0.270] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 7 matches and 57 non-matches
Purity of oracle classification: 0.891
Entropy of oracle classification: 0.498
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
56.0
Analisando o arquivo: diverg(15)762_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 762), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)762_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 722
Number of entity ID pairs that occurred more than once: 0
Analyse set of 722 weight vectors
Containing 217 true matches and 505 true non-matches
(30.06% true matches)
Identified 667 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 631 (94.60%)
2 : 33 (4.95%)
3 : 2 (0.30%)
19 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 667 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 484
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 721
Number of unique weight vectors: 667
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(667, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 667 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 667 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 23 matches and 61 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 583 weight vectors
Based on 23 matches and 61 non-matches
Classified 0 matches and 583 non-matches
40.0
Analisando o arquivo: diverg(10)612_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 612), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)612_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 656
Number of entity ID pairs that occurred more than once: 0
Analyse set of 656 weight vectors
Containing 215 true matches and 441 true non-matches
(32.77% true matches)
Identified 623 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 607 (97.43%)
2 : 13 (2.09%)
3 : 2 (0.32%)
17 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 623 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 440
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 655
Number of unique weight vectors: 623
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(623, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 623 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 623 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.364, 0.619, 0.471, 0.600, 0.533] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 32 matches and 51 non-matches
Purity of oracle classification: 0.614
Entropy of oracle classification: 0.962
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 540 weight vectors
Based on 32 matches and 51 non-matches
Classified 146 matches and 394 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6144578313253012, 0.9618624139909456, 0.3855421686746988)
(394, 0.6144578313253012, 0.9618624139909456, 0.3855421686746988)
Current size of match and non-match training data sets: 32 / 51
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 146 weight vectors
- Estimated match proportion 0.386
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 146 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.938, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 51 matches and 5 non-matches
Purity of oracle classification: 0.911
Entropy of oracle classification: 0.434
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)511_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 511), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)511_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 617
Number of entity ID pairs that occurred more than once: 0
Analyse set of 617 weight vectors
Containing 186 true matches and 431 true non-matches
(30.15% true matches)
Identified 577 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 543 (94.11%)
2 : 31 (5.37%)
3 : 2 (0.35%)
6 : 1 (0.17%)
Identified 0 non-pure unique weight vectors (from 577 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.000 : 411
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 617
Number of unique weight vectors: 577
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(577, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 577 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 577 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 32 matches and 50 non-matches
Purity of oracle classification: 0.610
Entropy of oracle classification: 0.965
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 495 weight vectors
Based on 32 matches and 50 non-matches
Classified 157 matches and 338 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(157, 0.6097560975609756, 0.9649567669505688, 0.3902439024390244)
(338, 0.6097560975609756, 0.9649567669505688, 0.3902439024390244)
Current size of match and non-match training data sets: 32 / 50
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 157 weight vectors
- Estimated match proportion 0.390
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 157 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 47 matches and 11 non-matches
Purity of oracle classification: 0.810
Entropy of oracle classification: 0.701
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 11
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(15)481_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 481), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)481_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 727
Number of entity ID pairs that occurred more than once: 0
Analyse set of 727 weight vectors
Containing 209 true matches and 518 true non-matches
(28.75% true matches)
Identified 693 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 676 (97.55%)
2 : 14 (2.02%)
3 : 2 (0.29%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 693 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 515
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 726
Number of unique weight vectors: 693
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(693, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 693 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 693 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 609 weight vectors
Based on 31 matches and 53 non-matches
Classified 142 matches and 467 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(467, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 467 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 467 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 3 matches and 72 non-matches
Purity of oracle classification: 0.960
Entropy of oracle classification: 0.242
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)833_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 833), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)833_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 724
Number of entity ID pairs that occurred more than once: 0
Analyse set of 724 weight vectors
Containing 219 true matches and 505 true non-matches
(30.25% true matches)
Identified 688 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 672 (97.67%)
2 : 13 (1.89%)
3 : 2 (0.29%)
20 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 688 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 504
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 723
Number of unique weight vectors: 688
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(688, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 688 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 688 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 604 weight vectors
Based on 28 matches and 56 non-matches
Classified 140 matches and 464 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(464, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 464 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 464 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.462, 0.609, 0.643, 0.706, 0.786] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.533, 0.667, 0.333, 0.714, 0.632] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 10 matches and 62 non-matches
Purity of oracle classification: 0.861
Entropy of oracle classification: 0.581
Number of true matches: 10
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)619_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (20, 1 - acm diverg, 619), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)619_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 728
Number of entity ID pairs that occurred more than once: 0
Analyse set of 728 weight vectors
Containing 197 true matches and 531 true non-matches
(27.06% true matches)
Identified 704 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 687 (97.59%)
2 : 14 (1.99%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 704 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.000 : 529
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 728
Number of unique weight vectors: 704
Time to load and analyse the weight vector file: 0.04 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(704, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 704 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 704 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 30 matches and 54 non-matches
Purity of oracle classification: 0.643
Entropy of oracle classification: 0.940
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 620 weight vectors
Based on 30 matches and 54 non-matches
Classified 131 matches and 489 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
(489, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
Current size of match and non-match training data sets: 30 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 131 weight vectors
- Estimated match proportion 0.357
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 49 matches and 4 non-matches
Purity of oracle classification: 0.925
Entropy of oracle classification: 0.386
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)200_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 200), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)200_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 607
Number of entity ID pairs that occurred more than once: 0
Analyse set of 607 weight vectors
Containing 192 true matches and 415 true non-matches
(31.63% true matches)
Identified 571 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 554 (97.02%)
2 : 14 (2.45%)
3 : 2 (0.35%)
19 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 571 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 158
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 412
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 606
Number of unique weight vectors: 571
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(571, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 571 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 571 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 30 matches and 52 non-matches
Purity of oracle classification: 0.634
Entropy of oracle classification: 0.947
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 489 weight vectors
Based on 30 matches and 52 non-matches
Classified 132 matches and 357 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(132, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
(357, 0.6341463414634146, 0.9474351361840306, 0.36585365853658536)
Current size of match and non-match training data sets: 30 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 132 weight vectors
- Estimated match proportion 0.366
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 132 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)27_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 27), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)27_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 673
Number of entity ID pairs that occurred more than once: 0
Analyse set of 673 weight vectors
Containing 181 true matches and 492 true non-matches
(26.89% true matches)
Identified 652 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 642 (98.47%)
2 : 7 (1.07%)
3 : 2 (0.31%)
11 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 652 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 160
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 491
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 672
Number of unique weight vectors: 652
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(652, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 652 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 652 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 32 matches and 51 non-matches
Purity of oracle classification: 0.614
Entropy of oracle classification: 0.962
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 569 weight vectors
Based on 32 matches and 51 non-matches
Classified 132 matches and 437 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(132, 0.6144578313253012, 0.9618624139909456, 0.3855421686746988)
(437, 0.6144578313253012, 0.9618624139909456, 0.3855421686746988)
Current size of match and non-match training data sets: 32 / 51
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 132 weight vectors
- Estimated match proportion 0.386
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 132 vectors
The selected farthest weight vectors are:
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 46 matches and 8 non-matches
Purity of oracle classification: 0.852
Entropy of oracle classification: 0.605
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(20)993_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 993), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)993_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1100
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1100 weight vectors
Containing 227 true matches and 873 true non-matches
(20.64% true matches)
Identified 1043 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1006 (96.45%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1043 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1099
Number of unique weight vectors: 1043
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1043, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1043 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1043 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 955 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 846 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(846, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 846 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 846 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)993_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984375
recall 0.210702
f-measure 0.347107
da 64
dm 0
ndm 0
tp 63
fp 1
tn 4.76529e+07
fn 236
Name: (15, 1 - acm diverg, 993), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)993_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1012
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1012 weight vectors
Containing 202 true matches and 810 true non-matches
(19.96% true matches)
Identified 962 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 928 (96.47%)
2 : 31 (3.22%)
3 : 2 (0.21%)
16 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 962 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 789
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1011
Number of unique weight vectors: 962
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(962, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 962 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 962 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 875 weight vectors
Based on 28 matches and 59 non-matches
Classified 138 matches and 737 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(138, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(737, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 138 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 138 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 47 matches and 5 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.457
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
64.0
Analisando o arquivo: diverg(20)643_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 643), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)643_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)596_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 596), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)596_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 945
Number of entity ID pairs that occurred more than once: 0
Analyse set of 945 weight vectors
Containing 219 true matches and 726 true non-matches
(23.17% true matches)
Identified 890 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 854 (95.96%)
2 : 33 (3.71%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 890 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 705
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 944
Number of unique weight vectors: 890
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(890, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 890 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 890 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 804 weight vectors
Based on 24 matches and 62 non-matches
Classified 130 matches and 674 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(130, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(674, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 674 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 674 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 11 matches and 58 non-matches
Purity of oracle classification: 0.841
Entropy of oracle classification: 0.633
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)169_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.197324
f-measure 0.329609
da 59
dm 0
ndm 0
tp 59
fp 0
tn 4.76529e+07
fn 240
Name: (10, 1 - acm diverg, 169), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)169_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 364
Number of entity ID pairs that occurred more than once: 0
Analyse set of 364 weight vectors
Containing 193 true matches and 171 true non-matches
(53.02% true matches)
Identified 338 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 323 (95.56%)
2 : 12 (3.55%)
3 : 2 (0.59%)
11 : 1 (0.30%)
Identified 1 non-pure unique weight vectors (from 338 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 168
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 363
Number of unique weight vectors: 338
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(338, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 338 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 338 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 31 matches and 44 non-matches
Purity of oracle classification: 0.587
Entropy of oracle classification: 0.978
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 263 weight vectors
Based on 31 matches and 44 non-matches
Classified 140 matches and 123 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 75
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.5866666666666667, 0.9782176659354248, 0.41333333333333333)
(123, 0.5866666666666667, 0.9782176659354248, 0.41333333333333333)
Current size of match and non-match training data sets: 31 / 44
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 140 weight vectors
- Estimated match proportion 0.413
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 140 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 49 matches and 7 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
59.0
Analisando o arquivo: diverg(10)67_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984615
recall 0.214047
f-measure 0.351648
da 65
dm 0
ndm 0
tp 64
fp 1
tn 4.76529e+07
fn 235
Name: (10, 1 - acm diverg, 67), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)67_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 499
Number of entity ID pairs that occurred more than once: 0
Analyse set of 499 weight vectors
Containing 179 true matches and 320 true non-matches
(35.87% true matches)
Identified 473 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 461 (97.46%)
2 : 9 (1.90%)
3 : 2 (0.42%)
14 : 1 (0.21%)
Identified 1 non-pure unique weight vectors (from 473 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 153
0.929 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 319
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 498
Number of unique weight vectors: 473
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(473, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 473 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 473 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 28 matches and 52 non-matches
Purity of oracle classification: 0.650
Entropy of oracle classification: 0.934
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 393 weight vectors
Based on 28 matches and 52 non-matches
Classified 130 matches and 263 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(130, 0.65, 0.934068055375491, 0.35)
(263, 0.65, 0.934068055375491, 0.35)
Current size of match and non-match training data sets: 28 / 52
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 130 weight vectors
- Estimated match proportion 0.350
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 130 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 47 matches and 5 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.457
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
65.0
Analisando o arquivo: diverg(15)207_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 207), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)207_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 817
Number of entity ID pairs that occurred more than once: 0
Analyse set of 817 weight vectors
Containing 225 true matches and 592 true non-matches
(27.54% true matches)
Identified 760 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 723 (95.13%)
2 : 34 (4.47%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 760 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 571
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 816
Number of unique weight vectors: 760
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(760, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 760 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 760 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 25 matches and 60 non-matches
Purity of oracle classification: 0.706
Entropy of oracle classification: 0.874
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 675 weight vectors
Based on 25 matches and 60 non-matches
Classified 128 matches and 547 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(128, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
(547, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
Current size of match and non-match training data sets: 25 / 60
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 547 weight vectors
- Estimated match proportion 0.294
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 547 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.423, 0.478, 0.500, 0.813, 0.545] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 17 matches and 52 non-matches
Purity of oracle classification: 0.754
Entropy of oracle classification: 0.805
Number of true matches: 17
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)934_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (15, 1 - acm diverg, 934), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)934_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 996
Number of entity ID pairs that occurred more than once: 0
Analyse set of 996 weight vectors
Containing 170 true matches and 826 true non-matches
(17.07% true matches)
Identified 959 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 928 (96.77%)
2 : 28 (2.92%)
3 : 2 (0.21%)
6 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 959 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 153
0.000 : 806
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 996
Number of unique weight vectors: 959
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(959, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 959 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 959 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 25 matches and 62 non-matches
Purity of oracle classification: 0.713
Entropy of oracle classification: 0.865
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 872 weight vectors
Based on 25 matches and 62 non-matches
Classified 42 matches and 830 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(42, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
(830, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 25 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 42 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 28
Farthest first selection of 28 weight vectors from 42 vectors
The selected farthest weight vectors are:
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[0.971, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.833, 1.000, 1.000, 0.935] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.929, 1.000, 1.000, 1.000, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
Perform oracle with 100.00 accuracy on 28 weight vectors
The oracle will correctly classify 28 weight vectors and wrongly classify 0
Classified 28 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 28 weight vectors (classified by oracle) from cluster
Cluster is pure enough and not too large, add its 42 weight vectors to:
Match training set
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 3: Queue length: 1
Number of manual oracle classifications performed: 115
Size, purity, entropy, and estimated match proportion of clusters in queue:
(830, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 67 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 830 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 830 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 16 matches and 56 non-matches
Purity of oracle classification: 0.778
Entropy of oracle classification: 0.764
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(15)676_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 676), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)676_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 917
Number of entity ID pairs that occurred more than once: 0
Analyse set of 917 weight vectors
Containing 199 true matches and 718 true non-matches
(21.70% true matches)
Identified 872 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 838 (96.10%)
2 : 31 (3.56%)
3 : 2 (0.23%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 872 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 697
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 916
Number of unique weight vectors: 872
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(872, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 872 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 872 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 25 matches and 61 non-matches
Purity of oracle classification: 0.709
Entropy of oracle classification: 0.870
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 786 weight vectors
Based on 25 matches and 61 non-matches
Classified 125 matches and 661 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(125, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
(661, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
Current size of match and non-match training data sets: 25 / 61
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 661 weight vectors
- Estimated match proportion 0.291
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 661 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 13 matches and 58 non-matches
Purity of oracle classification: 0.817
Entropy of oracle classification: 0.687
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)934_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 934), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)934_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(10)412_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 412), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)412_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 409
Number of entity ID pairs that occurred more than once: 0
Analyse set of 409 weight vectors
Containing 200 true matches and 209 true non-matches
(48.90% true matches)
Identified 383 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 369 (96.34%)
2 : 11 (2.87%)
3 : 2 (0.52%)
12 : 1 (0.26%)
Identified 1 non-pure unique weight vectors (from 383 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 208
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 408
Number of unique weight vectors: 383
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(383, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 383 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 383 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 40 matches and 37 non-matches
Purity of oracle classification: 0.519
Entropy of oracle classification: 0.999
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 37
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 306 weight vectors
Based on 40 matches and 37 non-matches
Classified 133 matches and 173 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.5194805194805194, 0.9989047442823606, 0.5194805194805194)
(173, 0.5194805194805194, 0.9989047442823606, 0.5194805194805194)
Current size of match and non-match training data sets: 40 / 37
Selected cluster with (queue ordering: random):
- Purity 0.52 and entropy 1.00
- Size 173 weight vectors
- Estimated match proportion 0.519
Sample size for this cluster: 62
Farthest first selection of 62 weight vectors from 173 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.717, 1.000, 0.240, 0.231, 0.065, 0.192, 0.184] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.800, 1.000, 0.259, 0.229, 0.214, 0.258, 0.156] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.850, 1.000, 0.179, 0.205, 0.188, 0.061, 0.180] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.663, 1.000, 0.273, 0.244, 0.226, 0.196, 0.238] (False)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.913, 1.000, 0.184, 0.175, 0.087, 0.233, 0.167] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.715, 1.000, 0.214, 0.125, 0.270, 0.214, 0.167] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.758, 1.000, 0.300, 0.140, 0.135, 0.125, 0.148] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.947, 1.000, 0.292, 0.178, 0.227, 0.122, 0.154] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 62 weight vectors
The oracle will correctly classify 62 weight vectors and wrongly classify 0
Classified 8 matches and 54 non-matches
Purity of oracle classification: 0.871
Entropy of oracle classification: 0.555
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 62 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)532_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 532), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)532_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 370
Number of entity ID pairs that occurred more than once: 0
Analyse set of 370 weight vectors
Containing 191 true matches and 179 true non-matches
(51.62% true matches)
Identified 349 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 335 (95.99%)
2 : 11 (3.15%)
3 : 2 (0.57%)
7 : 1 (0.29%)
Identified 0 non-pure unique weight vectors (from 349 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.000 : 179
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 370
Number of unique weight vectors: 349
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(349, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 349 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 349 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 39 matches and 36 non-matches
Purity of oracle classification: 0.520
Entropy of oracle classification: 0.999
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 36
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 274 weight vectors
Based on 39 matches and 36 non-matches
Classified 129 matches and 145 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 75
Size, purity, entropy, and estimated match proportion of clusters in queue:
(129, 0.52, 0.9988455359952018, 0.52)
(145, 0.52, 0.9988455359952018, 0.52)
Current size of match and non-match training data sets: 39 / 36
Selected cluster with (queue ordering: random):
- Purity 0.52 and entropy 1.00
- Size 129 weight vectors
- Estimated match proportion 0.520
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 129 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 51 matches and 4 non-matches
Purity of oracle classification: 0.927
Entropy of oracle classification: 0.376
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)343_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (15, 1 - acm diverg, 343), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)343_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 865
Number of entity ID pairs that occurred more than once: 0
Analyse set of 865 weight vectors
Containing 203 true matches and 662 true non-matches
(23.47% true matches)
Identified 816 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 782 (95.83%)
2 : 31 (3.80%)
3 : 2 (0.25%)
15 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 816 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 641
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 864
Number of unique weight vectors: 816
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(816, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 816 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 816 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 730 weight vectors
Based on 29 matches and 57 non-matches
Classified 172 matches and 558 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(172, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(558, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 172 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 172 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.942, 1.000, 0.156, 0.172, 0.189, 0.148, 0.133] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 44 matches and 13 non-matches
Purity of oracle classification: 0.772
Entropy of oracle classification: 0.775
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 13
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)640_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 640), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)640_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 663
Number of entity ID pairs that occurred more than once: 0
Analyse set of 663 weight vectors
Containing 194 true matches and 469 true non-matches
(29.26% true matches)
Identified 642 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 628 (97.82%)
2 : 11 (1.71%)
3 : 2 (0.31%)
7 : 1 (0.16%)
Identified 0 non-pure unique weight vectors (from 642 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.000 : 469
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 663
Number of unique weight vectors: 642
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(642, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 642 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 642 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 559 weight vectors
Based on 28 matches and 55 non-matches
Classified 127 matches and 432 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(127, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(432, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 127 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 127 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 49 matches and 2 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.239
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)891_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979592
recall 0.32107
f-measure 0.483627
da 98
dm 0
ndm 0
tp 96
fp 2
tn 4.76529e+07
fn 203
Name: (15, 1 - acm diverg, 891), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)891_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 953
Number of entity ID pairs that occurred more than once: 0
Analyse set of 953 weight vectors
Containing 169 true matches and 784 true non-matches
(17.73% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 885 (96.62%)
2 : 28 (3.06%)
3 : 2 (0.22%)
6 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 152
0.000 : 764
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 953
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 28 matches and 59 non-matches
Classified 120 matches and 709 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(120, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(709, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 709 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 709 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 3 matches and 72 non-matches
Purity of oracle classification: 0.960
Entropy of oracle classification: 0.242
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
98.0
Analisando o arquivo: diverg(10)588_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 588), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)588_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 836
Number of entity ID pairs that occurred more than once: 0
Analyse set of 836 weight vectors
Containing 208 true matches and 628 true non-matches
(24.88% true matches)
Identified 789 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 754 (95.56%)
2 : 32 (4.06%)
3 : 2 (0.25%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 789 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 607
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 835
Number of unique weight vectors: 789
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(789, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 789 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 789 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 25 matches and 60 non-matches
Purity of oracle classification: 0.706
Entropy of oracle classification: 0.874
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 704 weight vectors
Based on 25 matches and 60 non-matches
Classified 123 matches and 581 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
(581, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
Current size of match and non-match training data sets: 25 / 60
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 581 weight vectors
- Estimated match proportion 0.294
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 581 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 16 matches and 54 non-matches
Purity of oracle classification: 0.771
Entropy of oracle classification: 0.776
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)114_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 114), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)114_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 118 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 118 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)38_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 38), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)38_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 657
Number of entity ID pairs that occurred more than once: 0
Analyse set of 657 weight vectors
Containing 216 true matches and 441 true non-matches
(32.88% true matches)
Identified 624 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 608 (97.44%)
2 : 13 (2.08%)
3 : 2 (0.32%)
17 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 624 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 440
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 656
Number of unique weight vectors: 624
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(624, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 624 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 624 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.364, 0.619, 0.471, 0.600, 0.533] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 31 matches and 52 non-matches
Purity of oracle classification: 0.627
Entropy of oracle classification: 0.953
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 541 weight vectors
Based on 31 matches and 52 non-matches
Classified 151 matches and 390 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
(390, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
Current size of match and non-match training data sets: 31 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 151 weight vectors
- Estimated match proportion 0.373
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.933, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 50 matches and 6 non-matches
Purity of oracle classification: 0.893
Entropy of oracle classification: 0.491
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)71_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 71), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)71_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 790
Number of entity ID pairs that occurred more than once: 0
Analyse set of 790 weight vectors
Containing 208 true matches and 582 true non-matches
(26.33% true matches)
Identified 761 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 744 (97.77%)
2 : 14 (1.84%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 761 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 789
Number of unique weight vectors: 761
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(761, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 761 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 761 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 676 weight vectors
Based on 28 matches and 57 non-matches
Classified 133 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 543 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 543 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 12 matches and 61 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.645
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)3_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 3), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)3_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 810
Number of entity ID pairs that occurred more than once: 0
Analyse set of 810 weight vectors
Containing 223 true matches and 587 true non-matches
(27.53% true matches)
Identified 756 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 719 (95.11%)
2 : 34 (4.50%)
3 : 2 (0.26%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 756 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 566
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 809
Number of unique weight vectors: 756
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(756, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 756 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 756 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 671 weight vectors
Based on 27 matches and 58 non-matches
Classified 94 matches and 577 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(577, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 577 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 577 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 20 matches and 53 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 20
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)348_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (10, 1 - acm diverg, 348), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)348_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 634
Number of entity ID pairs that occurred more than once: 0
Analyse set of 634 weight vectors
Containing 166 true matches and 468 true non-matches
(26.18% true matches)
Identified 618 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 608 (98.38%)
2 : 7 (1.13%)
3 : 2 (0.32%)
6 : 1 (0.16%)
Identified 0 non-pure unique weight vectors (from 618 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.000 : 468
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 634
Number of unique weight vectors: 618
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(618, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 618 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 618 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 27 matches and 56 non-matches
Purity of oracle classification: 0.675
Entropy of oracle classification: 0.910
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 535 weight vectors
Based on 27 matches and 56 non-matches
Classified 109 matches and 426 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
(426, 0.6746987951807228, 0.9100534290139191, 0.3253012048192771)
Current size of match and non-match training data sets: 27 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 109 weight vectors
- Estimated match proportion 0.325
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 44 matches and 4 non-matches
Purity of oracle classification: 0.917
Entropy of oracle classification: 0.414
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(10)123_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 123), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)123_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 606
Number of entity ID pairs that occurred more than once: 0
Analyse set of 606 weight vectors
Containing 187 true matches and 419 true non-matches
(30.86% true matches)
Identified 566 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 532 (93.99%)
2 : 31 (5.48%)
3 : 2 (0.35%)
6 : 1 (0.18%)
Identified 0 non-pure unique weight vectors (from 566 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.000 : 399
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 606
Number of unique weight vectors: 566
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(566, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 566 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 566 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 27 matches and 55 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 484 weight vectors
Based on 27 matches and 55 non-matches
Classified 147 matches and 337 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(147, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
(337, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
Current size of match and non-match training data sets: 27 / 55
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 147 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 147 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 47 matches and 7 non-matches
Purity of oracle classification: 0.870
Entropy of oracle classification: 0.556
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)436_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 436), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)436_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.05 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 103 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 103 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 43 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)430_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 430), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)430_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 788
Number of entity ID pairs that occurred more than once: 0
Analyse set of 788 weight vectors
Containing 208 true matches and 580 true non-matches
(26.40% true matches)
Identified 759 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 742 (97.76%)
2 : 14 (1.84%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 759 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 577
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 787
Number of unique weight vectors: 759
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(759, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 759 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 759 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 674 weight vectors
Based on 30 matches and 55 non-matches
Classified 144 matches and 530 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(530, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 144 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 144 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 51 matches and 4 non-matches
Purity of oracle classification: 0.927
Entropy of oracle classification: 0.376
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)380_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 380), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)380_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)206_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (15, 1 - acm diverg, 206), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)206_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 875
Number of entity ID pairs that occurred more than once: 0
Analyse set of 875 weight vectors
Containing 189 true matches and 686 true non-matches
(21.60% true matches)
Identified 835 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 801 (95.93%)
2 : 31 (3.71%)
3 : 2 (0.24%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 835 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.000 : 666
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 875
Number of unique weight vectors: 835
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(835, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 835 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 835 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 749 weight vectors
Based on 30 matches and 56 non-matches
Classified 168 matches and 581 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(168, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(581, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 581 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 581 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.538, 0.789, 0.353, 0.545, 0.550] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.857, 0.417, 0.750, 0.500, 0.455] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 0 matches and 76 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 76
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(10)523_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990385
recall 0.344482
f-measure 0.511166
da 104
dm 0
ndm 0
tp 103
fp 1
tn 4.76529e+07
fn 196
Name: (10, 1 - acm diverg, 523), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)523_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 540
Number of entity ID pairs that occurred more than once: 0
Analyse set of 540 weight vectors
Containing 151 true matches and 389 true non-matches
(27.96% true matches)
Identified 525 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 518 (98.67%)
2 : 4 (0.76%)
3 : 2 (0.38%)
8 : 1 (0.19%)
Identified 1 non-pure unique weight vectors (from 525 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 136
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 388
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 532
Number of unique weight vectors: 524
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(524, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 524 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 524 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 27 matches and 54 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 443 weight vectors
Based on 27 matches and 54 non-matches
Classified 95 matches and 348 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(95, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(348, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 27 / 54
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 95 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 95 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 43 matches and 2 non-matches
Purity of oracle classification: 0.956
Entropy of oracle classification: 0.262
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
104.0
Analisando o arquivo: diverg(20)657_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 657), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)657_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)522_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 522), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)522_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 27 matches and 60 non-matches
Classified 148 matches and 784 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(784, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 784 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 784 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 8 matches and 66 non-matches
Purity of oracle classification: 0.892
Entropy of oracle classification: 0.494
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)309_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987952
recall 0.274247
f-measure 0.429319
da 83
dm 0
ndm 0
tp 82
fp 1
tn 4.76529e+07
fn 217
Name: (10, 1 - acm diverg, 309), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)309_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 249
Number of entity ID pairs that occurred more than once: 0
Analyse set of 249 weight vectors
Containing 165 true matches and 84 true non-matches
(66.27% true matches)
Identified 232 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 224 (96.55%)
2 : 5 (2.16%)
3 : 2 (0.86%)
9 : 1 (0.43%)
Identified 1 non-pure unique weight vectors (from 232 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 83
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 240
Number of unique weight vectors: 231
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(231, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 231 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 68
Perform initial selection using "far" method
Farthest first selection of 68 weight vectors from 231 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 35 matches and 33 non-matches
Purity of oracle classification: 0.515
Entropy of oracle classification: 0.999
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 33
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 163 weight vectors
Based on 35 matches and 33 non-matches
Classified 113 matches and 50 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 68
Size, purity, entropy, and estimated match proportion of clusters in queue:
(113, 0.5147058823529411, 0.9993759069576514, 0.5147058823529411)
(50, 0.5147058823529411, 0.9993759069576514, 0.5147058823529411)
Current size of match and non-match training data sets: 35 / 33
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 113 weight vectors
- Estimated match proportion 0.515
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 113 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.833, 1.000, 1.000, 0.935] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 47 matches and 5 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.457
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
83.0
Analisando o arquivo: diverg(20)94_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 94), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)94_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1091
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1091 weight vectors
Containing 226 true matches and 865 true non-matches
(20.71% true matches)
Identified 1034 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 997 (96.42%)
2 : 34 (3.29%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1034 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 844
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1090
Number of unique weight vectors: 1034
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1034, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1034 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1034 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 946 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 815 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(815, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 131 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)982_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990566
recall 0.351171
f-measure 0.518519
da 106
dm 0
ndm 0
tp 105
fp 1
tn 4.76529e+07
fn 194
Name: (15, 1 - acm diverg, 982), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)982_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 905
Number of entity ID pairs that occurred more than once: 0
Analyse set of 905 weight vectors
Containing 154 true matches and 751 true non-matches
(17.02% true matches)
Identified 869 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 841 (96.78%)
2 : 25 (2.88%)
3 : 2 (0.23%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 869 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 730
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 897
Number of unique weight vectors: 868
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(868, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 868 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 868 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 23 matches and 63 non-matches
Purity of oracle classification: 0.733
Entropy of oracle classification: 0.838
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 782 weight vectors
Based on 23 matches and 63 non-matches
Classified 68 matches and 714 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(68, 0.7325581395348837, 0.837769869006679, 0.26744186046511625)
(714, 0.7325581395348837, 0.837769869006679, 0.26744186046511625)
Current size of match and non-match training data sets: 23 / 63
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.84
- Size 68 weight vectors
- Estimated match proportion 0.267
Sample size for this cluster: 36
Farthest first selection of 36 weight vectors from 68 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
Perform oracle with 100.00 accuracy on 36 weight vectors
The oracle will correctly classify 36 weight vectors and wrongly classify 0
Classified 36 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 36 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
106.0
Analisando o arquivo: diverg(10)454_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (10, 1 - acm diverg, 454), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)454_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 526
Number of entity ID pairs that occurred more than once: 0
Analyse set of 526 weight vectors
Containing 224 true matches and 302 true non-matches
(42.59% true matches)
Identified 487 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 468 (96.10%)
2 : 16 (3.29%)
3 : 2 (0.41%)
20 : 1 (0.21%)
Identified 1 non-pure unique weight vectors (from 487 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 299
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 525
Number of unique weight vectors: 487
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(487, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 487 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 487 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 35 matches and 45 non-matches
Purity of oracle classification: 0.562
Entropy of oracle classification: 0.989
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 407 weight vectors
Based on 35 matches and 45 non-matches
Classified 172 matches and 235 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(172, 0.5625, 0.9886994082884974, 0.4375)
(235, 0.5625, 0.9886994082884974, 0.4375)
Current size of match and non-match training data sets: 35 / 45
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 172 weight vectors
- Estimated match proportion 0.438
Sample size for this cluster: 61
Farthest first selection of 61 weight vectors from 172 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.821, 1.000, 0.275, 0.297, 0.227, 0.255, 0.152] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 61 weight vectors
The oracle will correctly classify 61 weight vectors and wrongly classify 0
Classified 45 matches and 16 non-matches
Purity of oracle classification: 0.738
Entropy of oracle classification: 0.830
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 16
Number of false non-matches: 0
Deleted 61 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)988_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 988), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)988_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)141_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 141), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)141_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 825
Number of entity ID pairs that occurred more than once: 0
Analyse set of 825 weight vectors
Containing 219 true matches and 606 true non-matches
(26.55% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 733 (95.32%)
2 : 33 (4.29%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 585
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 824
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 31 matches and 54 non-matches
Purity of oracle classification: 0.635
Entropy of oracle classification: 0.947
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 31 matches and 54 non-matches
Classified 325 matches and 359 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(325, 0.6352941176470588, 0.9465202215633438, 0.36470588235294116)
(359, 0.6352941176470588, 0.9465202215633438, 0.36470588235294116)
Current size of match and non-match training data sets: 31 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.95
- Size 325 weight vectors
- Estimated match proportion 0.365
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 325 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 42 matches and 28 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)469_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (15, 1 - acm diverg, 469), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)469_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 929
Number of entity ID pairs that occurred more than once: 0
Analyse set of 929 weight vectors
Containing 178 true matches and 751 true non-matches
(19.16% true matches)
Identified 890 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 860 (96.63%)
2 : 27 (3.03%)
3 : 2 (0.22%)
9 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 890 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 730
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 920
Number of unique weight vectors: 889
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(889, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 889 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 889 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 23 matches and 63 non-matches
Purity of oracle classification: 0.733
Entropy of oracle classification: 0.838
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 803 weight vectors
Based on 23 matches and 63 non-matches
Classified 89 matches and 714 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(89, 0.7325581395348837, 0.837769869006679, 0.26744186046511625)
(714, 0.7325581395348837, 0.837769869006679, 0.26744186046511625)
Current size of match and non-match training data sets: 23 / 63
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.84
- Size 714 weight vectors
- Estimated match proportion 0.267
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 714 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(15)921_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 921), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)921_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 712
Number of entity ID pairs that occurred more than once: 0
Analyse set of 712 weight vectors
Containing 217 true matches and 495 true non-matches
(30.48% true matches)
Identified 657 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 621 (94.52%)
2 : 33 (5.02%)
3 : 2 (0.30%)
19 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 657 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 474
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 711
Number of unique weight vectors: 657
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(657, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 657 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 657 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 573 weight vectors
Based on 28 matches and 56 non-matches
Classified 164 matches and 409 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(164, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(409, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 409 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 409 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[1.000, 0.000, 0.700, 0.429, 0.476, 0.647, 0.810] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 2 matches and 69 non-matches
Purity of oracle classification: 0.972
Entropy of oracle classification: 0.185
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)955_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 955), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)955_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 907
Number of entity ID pairs that occurred more than once: 0
Analyse set of 907 weight vectors
Containing 200 true matches and 707 true non-matches
(22.05% true matches)
Identified 862 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 828 (96.06%)
2 : 31 (3.60%)
3 : 2 (0.23%)
11 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 862 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 686
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 906
Number of unique weight vectors: 862
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(862, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 862 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 862 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 776 weight vectors
Based on 29 matches and 57 non-matches
Classified 154 matches and 622 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(154, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(622, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 154 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 154 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 46 matches and 9 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.643
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)667_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.977099
recall 0.428094
f-measure 0.595349
da 131
dm 0
ndm 0
tp 128
fp 3
tn 4.76529e+07
fn 171
Name: (10, 1 - acm diverg, 667), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)667_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 151
Number of entity ID pairs that occurred more than once: 0
Analyse set of 151 weight vectors
Containing 116 true matches and 35 true non-matches
(76.82% true matches)
Identified 141 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 134 (95.04%)
2 : 4 (2.84%)
3 : 3 (2.13%)
Identified 0 non-pure unique weight vectors (from 141 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 106
0.000 : 35
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 151
Number of unique weight vectors: 141
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 141 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 57
Perform initial selection using "far" method
Farthest first selection of 57 weight vectors from 141 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 35 matches and 22 non-matches
Purity of oracle classification: 0.614
Entropy of oracle classification: 0.962
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 22
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 84 weight vectors
Based on 35 matches and 22 non-matches
Classified 84 matches and 0 non-matches
131.0
Analisando o arquivo: diverg(15)742_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 742), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)742_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 752
Number of entity ID pairs that occurred more than once: 0
Analyse set of 752 weight vectors
Containing 204 true matches and 548 true non-matches
(27.13% true matches)
Identified 723 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 706 (97.65%)
2 : 14 (1.94%)
3 : 2 (0.28%)
12 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 723 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 545
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 751
Number of unique weight vectors: 723
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(723, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 723 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 723 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.429, 0.786, 0.750, 0.389, 0.857] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 35 matches and 50 non-matches
Purity of oracle classification: 0.588
Entropy of oracle classification: 0.977
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 638 weight vectors
Based on 35 matches and 50 non-matches
Classified 308 matches and 330 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(308, 0.5882352941176471, 0.9774178175281716, 0.4117647058823529)
(330, 0.5882352941176471, 0.9774178175281716, 0.4117647058823529)
Current size of match and non-match training data sets: 35 / 50
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 308 weight vectors
- Estimated match proportion 0.412
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 308 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.890, 1.000, 0.281, 0.136, 0.183, 0.250, 0.163] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 45 matches and 26 non-matches
Purity of oracle classification: 0.634
Entropy of oracle classification: 0.948
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)919_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 919), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)919_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(20)543_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 543), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)543_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)819_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 819), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)819_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 292
Number of entity ID pairs that occurred more than once: 0
Analyse set of 292 weight vectors
Containing 207 true matches and 85 true non-matches
(70.89% true matches)
Identified 259 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 245 (94.59%)
2 : 11 (4.25%)
3 : 2 (0.77%)
19 : 1 (0.39%)
Identified 1 non-pure unique weight vectors (from 259 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 84
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 291
Number of unique weight vectors: 259
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(259, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 259 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 70
Perform initial selection using "far" method
Farthest first selection of 70 weight vectors from 259 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 38 matches and 32 non-matches
Purity of oracle classification: 0.543
Entropy of oracle classification: 0.995
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 32
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 189 weight vectors
Based on 38 matches and 32 non-matches
Classified 145 matches and 44 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 70
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.5428571428571428, 0.9946937953613058, 0.5428571428571428)
(44, 0.5428571428571428, 0.9946937953613058, 0.5428571428571428)
Current size of match and non-match training data sets: 38 / 32
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 0.99
- Size 145 weight vectors
- Estimated match proportion 0.543
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 145 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 49 matches and 9 non-matches
Purity of oracle classification: 0.845
Entropy of oracle classification: 0.623
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)569_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990196
recall 0.337793
f-measure 0.503741
da 102
dm 0
ndm 0
tp 101
fp 1
tn 4.76529e+07
fn 198
Name: (10, 1 - acm diverg, 569), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)569_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 462
Number of entity ID pairs that occurred more than once: 0
Analyse set of 462 weight vectors
Containing 162 true matches and 300 true non-matches
(35.06% true matches)
Identified 441 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 430 (97.51%)
2 : 8 (1.81%)
3 : 2 (0.45%)
10 : 1 (0.23%)
Identified 1 non-pure unique weight vectors (from 441 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 143
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 297
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 461
Number of unique weight vectors: 441
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(441, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 441 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 441 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 35 matches and 44 non-matches
Purity of oracle classification: 0.557
Entropy of oracle classification: 0.991
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 362 weight vectors
Based on 35 matches and 44 non-matches
Classified 122 matches and 240 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(122, 0.5569620253164557, 0.9906174973781801, 0.4430379746835443)
(240, 0.5569620253164557, 0.9906174973781801, 0.4430379746835443)
Current size of match and non-match training data sets: 35 / 44
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 240 weight vectors
- Estimated match proportion 0.443
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 240 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.512, 1.000, 0.087, 0.190, 0.107, 0.226, 0.204] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.600, 0.944, 0.250, 0.200, 0.186, 0.136, 0.118] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.818, 0.727, 0.438, 0.375, 0.400] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[0.821, 1.000, 0.275, 0.297, 0.227, 0.255, 0.152] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 2 matches and 66 non-matches
Purity of oracle classification: 0.971
Entropy of oracle classification: 0.191
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
102.0
Analisando o arquivo: diverg(20)507_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 507), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)507_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 701
Number of entity ID pairs that occurred more than once: 0
Analyse set of 701 weight vectors
Containing 219 true matches and 482 true non-matches
(31.24% true matches)
Identified 646 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 610 (94.43%)
2 : 33 (5.11%)
3 : 2 (0.31%)
19 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 646 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 461
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 700
Number of unique weight vectors: 646
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(646, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 646 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 646 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 563 weight vectors
Based on 28 matches and 55 non-matches
Classified 157 matches and 406 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(157, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(406, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 406 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 406 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 3 matches and 68 non-matches
Purity of oracle classification: 0.958
Entropy of oracle classification: 0.253
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 68
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)214_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990566
recall 0.351171
f-measure 0.518519
da 106
dm 0
ndm 0
tp 105
fp 1
tn 4.76529e+07
fn 194
Name: (10, 1 - acm diverg, 214), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)214_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 880
Number of entity ID pairs that occurred more than once: 0
Analyse set of 880 weight vectors
Containing 154 true matches and 726 true non-matches
(17.50% true matches)
Identified 844 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 816 (96.68%)
2 : 25 (2.96%)
3 : 2 (0.24%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 844 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 705
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 872
Number of unique weight vectors: 843
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(843, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 843 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 843 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 757 weight vectors
Based on 24 matches and 62 non-matches
Classified 73 matches and 684 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(73, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(684, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 684 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 684 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 14 matches and 55 non-matches
Purity of oracle classification: 0.797
Entropy of oracle classification: 0.728
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
106.0
Analisando o arquivo: diverg(20)829_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 829), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)829_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)253_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (15, 1 - acm diverg, 253), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)253_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 320
Number of entity ID pairs that occurred more than once: 0
Analyse set of 320 weight vectors
Containing 187 true matches and 133 true non-matches
(58.44% true matches)
Identified 298 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 282 (94.63%)
2 : 13 (4.36%)
3 : 2 (0.67%)
6 : 1 (0.34%)
Identified 0 non-pure unique weight vectors (from 298 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.000 : 131
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 320
Number of unique weight vectors: 298
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(298, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 298 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 73
Perform initial selection using "far" method
Farthest first selection of 73 weight vectors from 298 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 33 matches and 40 non-matches
Purity of oracle classification: 0.548
Entropy of oracle classification: 0.993
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 40
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 225 weight vectors
Based on 33 matches and 40 non-matches
Classified 135 matches and 90 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 73
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.547945205479452, 0.9933570282728468, 0.4520547945205479)
(90, 0.547945205479452, 0.9933570282728468, 0.4520547945205479)
Current size of match and non-match training data sets: 33 / 40
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 90 weight vectors
- Estimated match proportion 0.452
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 90 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 5 matches and 42 non-matches
Purity of oracle classification: 0.894
Entropy of oracle classification: 0.489
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 42
Number of false non-matches: 0
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)839_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 839), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)839_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)621_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.977778
recall 0.441472
f-measure 0.608295
da 135
dm 0
ndm 0
tp 132
fp 3
tn 4.76529e+07
fn 167
Name: (10, 1 - acm diverg, 621), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)621_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 665
Number of entity ID pairs that occurred more than once: 0
Analyse set of 665 weight vectors
Containing 132 true matches and 533 true non-matches
(19.85% true matches)
Identified 633 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 604 (95.42%)
2 : 26 (4.11%)
3 : 3 (0.47%)
Identified 0 non-pure unique weight vectors (from 633 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 120
0.000 : 513
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 665
Number of unique weight vectors: 633
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(633, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 633 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 633 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 26 matches and 57 non-matches
Purity of oracle classification: 0.687
Entropy of oracle classification: 0.897
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 550 weight vectors
Based on 26 matches and 57 non-matches
Classified 112 matches and 438 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
(438, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
Current size of match and non-match training data sets: 26 / 57
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 112 weight vectors
- Estimated match proportion 0.313
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 112 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.971, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 35 matches and 13 non-matches
Purity of oracle classification: 0.729
Entropy of oracle classification: 0.843
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 13
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
135.0
Analisando o arquivo: diverg(20)605_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 605), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)605_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 961
Number of entity ID pairs that occurred more than once: 0
Analyse set of 961 weight vectors
Containing 217 true matches and 744 true non-matches
(22.58% true matches)
Identified 906 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 870 (96.03%)
2 : 33 (3.64%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 906 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 723
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 960
Number of unique weight vectors: 906
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(906, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 906 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 906 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 819 weight vectors
Based on 27 matches and 60 non-matches
Classified 151 matches and 668 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(668, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 151 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)366_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 366), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)366_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 202
Number of entity ID pairs that occurred more than once: 0
Analyse set of 202 weight vectors
Containing 173 true matches and 29 true non-matches
(85.64% true matches)
Identified 183 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 170 (92.90%)
2 : 10 (5.46%)
3 : 2 (1.09%)
6 : 1 (0.55%)
Identified 0 non-pure unique weight vectors (from 183 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 154
0.000 : 29
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 202
Number of unique weight vectors: 183
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(183, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 183 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 63
Perform initial selection using "far" method
Farthest first selection of 63 weight vectors from 183 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.344, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 63 weight vectors
The oracle will correctly classify 63 weight vectors and wrongly classify 0
Classified 41 matches and 22 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 22
Number of false non-matches: 0
Deleted 63 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 120 weight vectors
Based on 41 matches and 22 non-matches
Classified 120 matches and 0 non-matches
69.0
Analisando o arquivo: diverg(10)918_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 918), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)918_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 908
Number of entity ID pairs that occurred more than once: 0
Analyse set of 908 weight vectors
Containing 200 true matches and 708 true non-matches
(22.03% true matches)
Identified 863 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 829 (96.06%)
2 : 31 (3.59%)
3 : 2 (0.23%)
11 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 863 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 687
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 907
Number of unique weight vectors: 863
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(863, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 863 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 863 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 777 weight vectors
Based on 29 matches and 57 non-matches
Classified 155 matches and 622 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(622, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 155 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 155 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 46 matches and 9 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.643
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)247_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990566
recall 0.351171
f-measure 0.518519
da 106
dm 0
ndm 0
tp 105
fp 1
tn 4.76529e+07
fn 194
Name: (10, 1 - acm diverg, 247), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)247_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 664
Number of entity ID pairs that occurred more than once: 0
Analyse set of 664 weight vectors
Containing 154 true matches and 510 true non-matches
(23.19% true matches)
Identified 628 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 600 (95.54%)
2 : 25 (3.98%)
3 : 2 (0.32%)
8 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 628 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 489
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 656
Number of unique weight vectors: 627
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(627, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 627 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 627 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 26 matches and 57 non-matches
Purity of oracle classification: 0.687
Entropy of oracle classification: 0.897
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 544 weight vectors
Based on 26 matches and 57 non-matches
Classified 94 matches and 450 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
(450, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
Current size of match and non-match training data sets: 26 / 57
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 94 weight vectors
- Estimated match proportion 0.313
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 94 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 42 matches and 2 non-matches
Purity of oracle classification: 0.955
Entropy of oracle classification: 0.267
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
106.0
Analisando o arquivo: diverg(10)736_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 736), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)736_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 451
Number of entity ID pairs that occurred more than once: 0
Analyse set of 451 weight vectors
Containing 195 true matches and 256 true non-matches
(43.24% true matches)
Identified 427 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 410 (96.02%)
2 : 14 (3.28%)
3 : 2 (0.47%)
7 : 1 (0.23%)
Identified 0 non-pure unique weight vectors (from 427 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.000 : 254
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 451
Number of unique weight vectors: 427
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(427, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 427 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 427 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 36 matches and 42 non-matches
Purity of oracle classification: 0.538
Entropy of oracle classification: 0.996
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 42
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 349 weight vectors
Based on 36 matches and 42 non-matches
Classified 140 matches and 209 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.5384615384615384, 0.9957274520849256, 0.46153846153846156)
(209, 0.5384615384615384, 0.9957274520849256, 0.46153846153846156)
Current size of match and non-match training data sets: 36 / 42
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 1.00
- Size 209 weight vectors
- Estimated match proportion 0.462
Sample size for this cluster: 66
Farthest first selection of 66 weight vectors from 209 vectors
The selected farthest weight vectors are:
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
Perform oracle with 100.00 accuracy on 66 weight vectors
The oracle will correctly classify 66 weight vectors and wrongly classify 0
Classified 4 matches and 62 non-matches
Purity of oracle classification: 0.939
Entropy of oracle classification: 0.330
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 66 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)753_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 753), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)753_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 611
Number of entity ID pairs that occurred more than once: 0
Analyse set of 611 weight vectors
Containing 191 true matches and 420 true non-matches
(31.26% true matches)
Identified 585 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 570 (97.44%)
2 : 12 (2.05%)
3 : 2 (0.34%)
11 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 585 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 417
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 610
Number of unique weight vectors: 585
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(585, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 585 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 585 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 32 matches and 50 non-matches
Purity of oracle classification: 0.610
Entropy of oracle classification: 0.965
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 503 weight vectors
Based on 32 matches and 50 non-matches
Classified 142 matches and 361 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6097560975609756, 0.9649567669505688, 0.3902439024390244)
(361, 0.6097560975609756, 0.9649567669505688, 0.3902439024390244)
Current size of match and non-match training data sets: 32 / 50
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 142 weight vectors
- Estimated match proportion 0.390
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 142 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 50 matches and 6 non-matches
Purity of oracle classification: 0.893
Entropy of oracle classification: 0.491
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)427_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 427), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)427_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 484
Number of entity ID pairs that occurred more than once: 0
Analyse set of 484 weight vectors
Containing 181 true matches and 303 true non-matches
(37.40% true matches)
Identified 458 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 447 (97.60%)
2 : 8 (1.75%)
3 : 2 (0.44%)
15 : 1 (0.22%)
Identified 1 non-pure unique weight vectors (from 458 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 155
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 302
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 483
Number of unique weight vectors: 458
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(458, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 458 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 458 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.786, 0.833, 0.545, 0.478, 0.346] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 27 matches and 52 non-matches
Purity of oracle classification: 0.658
Entropy of oracle classification: 0.927
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 379 weight vectors
Based on 27 matches and 52 non-matches
Classified 136 matches and 243 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.6582278481012658, 0.9265044456232998, 0.34177215189873417)
(243, 0.6582278481012658, 0.9265044456232998, 0.34177215189873417)
Current size of match and non-match training data sets: 27 / 52
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 243 weight vectors
- Estimated match proportion 0.342
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 243 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.750, 0.905, 0.667, 0.500, 0.571] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[1.000, 0.000, 0.333, 0.667, 0.400, 0.583, 0.563] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.545, 0.667, 0.571, 0.350, 0.600] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.667, 0.722, 0.353, 0.545, 0.800] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.433, 0.737, 0.706, 0.500, 0.800] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.500, 0.739, 0.824, 0.591, 0.550] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 0 matches and 64 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(20)521_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 521), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)521_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1027
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1027 weight vectors
Containing 223 true matches and 804 true non-matches
(21.71% true matches)
Identified 973 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 936 (96.20%)
2 : 34 (3.49%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 973 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 783
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 973
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(973, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 973 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 973 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 886 weight vectors
Based on 26 matches and 61 non-matches
Classified 131 matches and 755 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(755, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 755 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 755 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 11 matches and 62 non-matches
Purity of oracle classification: 0.849
Entropy of oracle classification: 0.612
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)784_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 784), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)784_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1050
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1050 weight vectors
Containing 208 true matches and 842 true non-matches
(19.81% true matches)
Identified 1003 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 968 (96.51%)
2 : 32 (3.19%)
3 : 2 (0.20%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1003 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 821
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1049
Number of unique weight vectors: 1003
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1003, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1003 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1003 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 916 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(793, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 793 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 793 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 12 matches and 58 non-matches
Purity of oracle classification: 0.829
Entropy of oracle classification: 0.661
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)250_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (15, 1 - acm diverg, 250), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)250_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 690
Number of entity ID pairs that occurred more than once: 0
Analyse set of 690 weight vectors
Containing 178 true matches and 512 true non-matches
(25.80% true matches)
Identified 651 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 621 (95.39%)
2 : 27 (4.15%)
3 : 2 (0.31%)
9 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 651 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 491
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 681
Number of unique weight vectors: 650
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(650, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 650 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 650 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 26 matches and 57 non-matches
Purity of oracle classification: 0.687
Entropy of oracle classification: 0.897
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 567 weight vectors
Based on 26 matches and 57 non-matches
Classified 115 matches and 452 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(115, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
(452, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
Current size of match and non-match training data sets: 26 / 57
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 115 weight vectors
- Estimated match proportion 0.313
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 115 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 46 matches and 2 non-matches
Purity of oracle classification: 0.958
Entropy of oracle classification: 0.250
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(10)302_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (10, 1 - acm diverg, 302), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)302_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 446
Number of entity ID pairs that occurred more than once: 0
Analyse set of 446 weight vectors
Containing 200 true matches and 246 true non-matches
(44.84% true matches)
Identified 414 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 398 (96.14%)
2 : 13 (3.14%)
3 : 2 (0.48%)
16 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 414 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 243
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 445
Number of unique weight vectors: 414
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(414, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 414 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 414 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.833, 0.550, 0.500, 0.688] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 39 matches and 39 non-matches
Purity of oracle classification: 0.500
Entropy of oracle classification: 1.000
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 336 weight vectors
Based on 39 matches and 39 non-matches
Classified 273 matches and 63 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(273, 0.5, 1.0, 0.5)
(63, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 39 / 39
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 273 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 273 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.913, 1.000, 0.184, 0.175, 0.087, 0.233, 0.167] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 41 matches and 30 non-matches
Purity of oracle classification: 0.577
Entropy of oracle classification: 0.983
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 30
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(15)560_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981481
recall 0.177258
f-measure 0.300283
da 54
dm 0
ndm 0
tp 53
fp 1
tn 4.76529e+07
fn 246
Name: (15, 1 - acm diverg, 560), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)560_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 829
Number of entity ID pairs that occurred more than once: 0
Analyse set of 829 weight vectors
Containing 212 true matches and 617 true non-matches
(25.57% true matches)
Identified 775 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 740 (95.48%)
2 : 32 (4.13%)
3 : 2 (0.26%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 775 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 596
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 828
Number of unique weight vectors: 775
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(775, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 775 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 775 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 690 weight vectors
Based on 28 matches and 57 non-matches
Classified 148 matches and 542 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(542, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 542 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 542 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 4 matches and 69 non-matches
Purity of oracle classification: 0.945
Entropy of oracle classification: 0.306
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
54.0
Analisando o arquivo: diverg(20)848_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 848), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)848_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 667
Number of entity ID pairs that occurred more than once: 0
Analyse set of 667 weight vectors
Containing 217 true matches and 450 true non-matches
(32.53% true matches)
Identified 630 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 612 (97.14%)
2 : 15 (2.38%)
3 : 2 (0.32%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 630 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 447
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 666
Number of unique weight vectors: 630
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(630, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 630 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 630 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 26 matches and 57 non-matches
Purity of oracle classification: 0.687
Entropy of oracle classification: 0.897
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 547 weight vectors
Based on 26 matches and 57 non-matches
Classified 133 matches and 414 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
(414, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
Current size of match and non-match training data sets: 26 / 57
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 133 weight vectors
- Estimated match proportion 0.313
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 133 vectors
The selected farthest weight vectors are:
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 49 matches and 2 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.239
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)508_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (15, 1 - acm diverg, 508), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)508_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 913
Number of entity ID pairs that occurred more than once: 0
Analyse set of 913 weight vectors
Containing 204 true matches and 709 true non-matches
(22.34% true matches)
Identified 862 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 828 (96.06%)
2 : 31 (3.60%)
3 : 2 (0.23%)
17 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 862 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 688
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 912
Number of unique weight vectors: 862
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(862, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 862 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 862 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 776 weight vectors
Based on 28 matches and 58 non-matches
Classified 146 matches and 630 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(630, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 630 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 630 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.667, 0.538, 0.455, 0.581, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 3 matches and 71 non-matches
Purity of oracle classification: 0.959
Entropy of oracle classification: 0.245
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)875_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 875), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)875_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 645
Number of entity ID pairs that occurred more than once: 0
Analyse set of 645 weight vectors
Containing 215 true matches and 430 true non-matches
(33.33% true matches)
Identified 593 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 557 (93.93%)
2 : 33 (5.56%)
3 : 2 (0.34%)
16 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 593 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 409
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 644
Number of unique weight vectors: 593
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(593, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 593 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 593 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 28 matches and 54 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 511 weight vectors
Based on 28 matches and 54 non-matches
Classified 146 matches and 365 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
(365, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
Current size of match and non-match training data sets: 28 / 54
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 365 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 365 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.714, 0.727, 0.750, 0.294, 0.833] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 10 matches and 60 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 10
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(15)297_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 297), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)297_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 465
Number of entity ID pairs that occurred more than once: 0
Analyse set of 465 weight vectors
Containing 197 true matches and 268 true non-matches
(42.37% true matches)
Identified 441 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 424 (96.15%)
2 : 14 (3.17%)
3 : 2 (0.45%)
7 : 1 (0.23%)
Identified 0 non-pure unique weight vectors (from 441 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.000 : 266
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 465
Number of unique weight vectors: 441
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(441, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 441 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 441 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 33 matches and 46 non-matches
Purity of oracle classification: 0.582
Entropy of oracle classification: 0.980
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 46
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 362 weight vectors
Based on 33 matches and 46 non-matches
Classified 135 matches and 227 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.5822784810126582, 0.980377508715691, 0.4177215189873418)
(227, 0.5822784810126582, 0.980377508715691, 0.4177215189873418)
Current size of match and non-match training data sets: 33 / 46
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 227 weight vectors
- Estimated match proportion 0.418
Sample size for this cluster: 66
Farthest first selection of 66 weight vectors from 227 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 66 weight vectors
The oracle will correctly classify 66 weight vectors and wrongly classify 0
Classified 7 matches and 59 non-matches
Purity of oracle classification: 0.894
Entropy of oracle classification: 0.488
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 66 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)188_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 188), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)188_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 586
Number of entity ID pairs that occurred more than once: 0
Analyse set of 586 weight vectors
Containing 186 true matches and 400 true non-matches
(31.74% true matches)
Identified 546 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 512 (93.77%)
2 : 31 (5.68%)
3 : 2 (0.37%)
6 : 1 (0.18%)
Identified 0 non-pure unique weight vectors (from 546 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.000 : 380
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 586
Number of unique weight vectors: 546
Time to load and analyse the weight vector file: 0.04 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(546, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 546 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 546 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 32 matches and 49 non-matches
Purity of oracle classification: 0.605
Entropy of oracle classification: 0.968
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 465 weight vectors
Based on 32 matches and 49 non-matches
Classified 156 matches and 309 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6049382716049383, 0.9679884922470297, 0.3950617283950617)
(309, 0.6049382716049383, 0.9679884922470297, 0.3950617283950617)
Current size of match and non-match training data sets: 32 / 49
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 309 weight vectors
- Estimated match proportion 0.395
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 309 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.556, 0.348, 0.467, 0.636, 0.412] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[1.000, 0.000, 0.538, 0.600, 0.471, 0.632, 0.688] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.800, 0.667, 0.381, 0.550, 0.429] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.741, 0.556, 0.667, 0.350, 0.556] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.800, 0.000, 0.444, 0.545, 0.333, 0.111, 0.533] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.462, 0.667, 0.636, 0.368, 0.500] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.583, 0.786, 0.842, 0.800, 0.833] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(10)292_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 292), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)292_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 700
Number of entity ID pairs that occurred more than once: 0
Analyse set of 700 weight vectors
Containing 214 true matches and 486 true non-matches
(30.57% true matches)
Identified 665 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 650 (97.74%)
2 : 12 (1.80%)
3 : 2 (0.30%)
20 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 665 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 485
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 699
Number of unique weight vectors: 665
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(665, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 665 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 665 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 35 matches and 49 non-matches
Purity of oracle classification: 0.583
Entropy of oracle classification: 0.980
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 581 weight vectors
Based on 35 matches and 49 non-matches
Classified 252 matches and 329 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(252, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
(329, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
Current size of match and non-match training data sets: 35 / 49
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 252 weight vectors
- Estimated match proportion 0.417
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 252 vectors
The selected farthest weight vectors are:
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.800, 1.000, 0.211, 0.133, 0.074, 0.133, 0.185] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 46 matches and 22 non-matches
Purity of oracle classification: 0.676
Entropy of oracle classification: 0.908
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 22
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)746_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.99
recall 0.331104
f-measure 0.496241
da 100
dm 0
ndm 0
tp 99
fp 1
tn 4.76529e+07
fn 200
Name: (15, 1 - acm diverg, 746), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)746_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 800
Number of entity ID pairs that occurred more than once: 0
Analyse set of 800 weight vectors
Containing 167 true matches and 633 true non-matches
(20.88% true matches)
Identified 761 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 732 (96.19%)
2 : 26 (3.42%)
3 : 2 (0.26%)
10 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 761 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 612
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 799
Number of unique weight vectors: 761
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(761, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 761 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 761 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 676 weight vectors
Based on 26 matches and 59 non-matches
Classified 89 matches and 587 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(89, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(587, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 587 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 587 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 16 matches and 55 non-matches
Purity of oracle classification: 0.775
Entropy of oracle classification: 0.770
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(15)910_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 910), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)910_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 860
Number of entity ID pairs that occurred more than once: 0
Analyse set of 860 weight vectors
Containing 227 true matches and 633 true non-matches
(26.40% true matches)
Identified 803 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 766 (95.39%)
2 : 34 (4.23%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 803 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 612
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 859
Number of unique weight vectors: 803
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(803, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 803 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 803 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 718 weight vectors
Based on 28 matches and 57 non-matches
Classified 155 matches and 563 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(563, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 563 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 563 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 5 matches and 69 non-matches
Purity of oracle classification: 0.932
Entropy of oracle classification: 0.357
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)657_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 657), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)657_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 882
Number of entity ID pairs that occurred more than once: 0
Analyse set of 882 weight vectors
Containing 212 true matches and 670 true non-matches
(24.04% true matches)
Identified 830 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 794 (95.66%)
2 : 33 (3.98%)
3 : 2 (0.24%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 830 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 649
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 881
Number of unique weight vectors: 830
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(830, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 830 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 830 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 744 weight vectors
Based on 29 matches and 57 non-matches
Classified 163 matches and 581 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(163, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(581, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 163 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 163 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 47 matches and 9 non-matches
Purity of oracle classification: 0.839
Entropy of oracle classification: 0.636
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)564_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 564), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)564_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1035
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1035 weight vectors
Containing 223 true matches and 812 true non-matches
(21.55% true matches)
Identified 981 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 944 (96.23%)
2 : 34 (3.47%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 981 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 791
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1034
Number of unique weight vectors: 981
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(981, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 981 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 981 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 894 weight vectors
Based on 28 matches and 59 non-matches
Classified 156 matches and 738 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(738, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 156 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 156 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)771_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 771), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)771_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 530
Number of entity ID pairs that occurred more than once: 0
Analyse set of 530 weight vectors
Containing 208 true matches and 322 true non-matches
(39.25% true matches)
Identified 501 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 484 (96.61%)
2 : 14 (2.79%)
3 : 2 (0.40%)
12 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 501 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 319
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 529
Number of unique weight vectors: 501
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(501, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 501 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 501 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 32 matches and 48 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 421 weight vectors
Based on 32 matches and 48 non-matches
Classified 140 matches and 281 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6, 0.9709505944546686, 0.4)
(281, 0.6, 0.9709505944546686, 0.4)
Current size of match and non-match training data sets: 32 / 48
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 140 weight vectors
- Estimated match proportion 0.400
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 140 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 53 matches and 3 non-matches
Purity of oracle classification: 0.946
Entropy of oracle classification: 0.301
Number of true matches: 53
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)138_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 138), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)138_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 432
Number of entity ID pairs that occurred more than once: 0
Analyse set of 432 weight vectors
Containing 184 true matches and 248 true non-matches
(42.59% true matches)
Identified 411 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 401 (97.57%)
2 : 7 (1.70%)
3 : 2 (0.49%)
11 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 411 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 163
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 247
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 431
Number of unique weight vectors: 411
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(411, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 411 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 411 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 33 matches and 45 non-matches
Purity of oracle classification: 0.577
Entropy of oracle classification: 0.983
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 333 weight vectors
Based on 33 matches and 45 non-matches
Classified 124 matches and 209 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(124, 0.5769230769230769, 0.9828586897127056, 0.4230769230769231)
(209, 0.5769230769230769, 0.9828586897127056, 0.4230769230769231)
Current size of match and non-match training data sets: 33 / 45
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 124 weight vectors
- Estimated match proportion 0.423
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 124 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 47 matches and 7 non-matches
Purity of oracle classification: 0.870
Entropy of oracle classification: 0.556
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(15)217_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (15, 1 - acm diverg, 217), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)217_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 863
Number of entity ID pairs that occurred more than once: 0
Analyse set of 863 weight vectors
Containing 160 true matches and 703 true non-matches
(18.54% true matches)
Identified 829 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 799 (96.38%)
2 : 27 (3.26%)
3 : 2 (0.24%)
4 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 829 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 146
0.000 : 683
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 863
Number of unique weight vectors: 829
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(829, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 829 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 829 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 26 matches and 60 non-matches
Purity of oracle classification: 0.698
Entropy of oracle classification: 0.884
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 743 weight vectors
Based on 26 matches and 60 non-matches
Classified 94 matches and 649 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6976744186046512, 0.8841151220488478, 0.3023255813953488)
(649, 0.6976744186046512, 0.8841151220488478, 0.3023255813953488)
Current size of match and non-match training data sets: 26 / 60
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 94 weight vectors
- Estimated match proportion 0.302
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 94 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(15)922_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 922), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)922_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 28 matches and 57 non-matches
Classified 141 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 141 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 141 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)500_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 500), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)500_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1043
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1043 weight vectors
Containing 222 true matches and 821 true non-matches
(21.28% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 952 (96.26%)
2 : 34 (3.44%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 800
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1042
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 26 matches and 61 non-matches
Classified 145 matches and 757 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(757, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 145 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 145 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)41_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 41), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)41_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)527_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 527), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)527_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)96_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (10, 1 - acm diverg, 96), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)96_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 190
Number of entity ID pairs that occurred more than once: 0
Analyse set of 190 weight vectors
Containing 143 true matches and 47 true non-matches
(75.26% true matches)
Identified 178 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 170 (95.51%)
2 : 5 (2.81%)
3 : 2 (1.12%)
4 : 1 (0.56%)
Identified 0 non-pure unique weight vectors (from 178 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 131
0.000 : 47
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 190
Number of unique weight vectors: 178
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(178, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 178 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 62
Perform initial selection using "far" method
Farthest first selection of 62 weight vectors from 178 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 62 weight vectors
The oracle will correctly classify 62 weight vectors and wrongly classify 0
Classified 32 matches and 30 non-matches
Purity of oracle classification: 0.516
Entropy of oracle classification: 0.999
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 30
Number of false non-matches: 0
Deleted 62 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 116 weight vectors
Based on 32 matches and 30 non-matches
Classified 108 matches and 8 non-matches
Non-match cluster not large enough for required sample size
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 1
Number of manual oracle classifications performed: 62
Size, purity, entropy, and estimated match proportion of clusters in queue:
(108, 0.5161290322580645, 0.9992492479956565, 0.5161290322580645)
Current size of match and non-match training data sets: 32 / 30
Selected cluster with (queue ordering: random):
- Purity 0.52 and entropy 1.00
- Size 108 weight vectors
- Estimated match proportion 0.516
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 108 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 42 matches and 9 non-matches
Purity of oracle classification: 0.824
Entropy of oracle classification: 0.672
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(10)265_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 265), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)265_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 691
Number of entity ID pairs that occurred more than once: 0
Analyse set of 691 weight vectors
Containing 191 true matches and 500 true non-matches
(27.64% true matches)
Identified 667 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 650 (97.45%)
2 : 14 (2.10%)
3 : 2 (0.30%)
7 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 667 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.000 : 498
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 691
Number of unique weight vectors: 667
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(667, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 667 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 667 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 34 matches and 50 non-matches
Purity of oracle classification: 0.595
Entropy of oracle classification: 0.974
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 583 weight vectors
Based on 34 matches and 50 non-matches
Classified 274 matches and 309 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(274, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
(309, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
Current size of match and non-match training data sets: 34 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 274 weight vectors
- Estimated match proportion 0.405
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 274 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 43 matches and 26 non-matches
Purity of oracle classification: 0.623
Entropy of oracle classification: 0.956
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(10)208_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987952
recall 0.274247
f-measure 0.429319
da 83
dm 0
ndm 0
tp 82
fp 1
tn 4.76529e+07
fn 217
Name: (10, 1 - acm diverg, 208), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)208_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 576
Number of entity ID pairs that occurred more than once: 0
Analyse set of 576 weight vectors
Containing 161 true matches and 415 true non-matches
(27.95% true matches)
Identified 556 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 545 (98.02%)
2 : 8 (1.44%)
3 : 2 (0.36%)
9 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 556 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 143
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 412
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 567
Number of unique weight vectors: 555
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(555, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 555 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 555 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 29 matches and 53 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.937
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 473 weight vectors
Based on 29 matches and 53 non-matches
Classified 119 matches and 354 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(119, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
(354, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
Current size of match and non-match training data sets: 29 / 53
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 119 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 119 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 45 matches and 6 non-matches
Purity of oracle classification: 0.882
Entropy of oracle classification: 0.523
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
83.0
Analisando o arquivo: diverg(10)39_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 39), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)39_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 579
Number of entity ID pairs that occurred more than once: 0
Analyse set of 579 weight vectors
Containing 151 true matches and 428 true non-matches
(26.08% true matches)
Identified 562 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 553 (98.40%)
2 : 6 (1.07%)
3 : 2 (0.36%)
8 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 562 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 136
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 425
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 571
Number of unique weight vectors: 561
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(561, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 561 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 561 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 32 matches and 50 non-matches
Purity of oracle classification: 0.610
Entropy of oracle classification: 0.965
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 479 weight vectors
Based on 32 matches and 50 non-matches
Classified 106 matches and 373 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(106, 0.6097560975609756, 0.9649567669505688, 0.3902439024390244)
(373, 0.6097560975609756, 0.9649567669505688, 0.3902439024390244)
Current size of match and non-match training data sets: 32 / 50
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 106 weight vectors
- Estimated match proportion 0.390
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 106 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 42 matches and 7 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(15)249_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 249), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)249_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 810
Number of entity ID pairs that occurred more than once: 0
Analyse set of 810 weight vectors
Containing 219 true matches and 591 true non-matches
(27.04% true matches)
Identified 754 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 718 (95.23%)
2 : 33 (4.38%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 754 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 570
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 809
Number of unique weight vectors: 754
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(754, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 754 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 754 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 25 matches and 60 non-matches
Purity of oracle classification: 0.706
Entropy of oracle classification: 0.874
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 669 weight vectors
Based on 25 matches and 60 non-matches
Classified 122 matches and 547 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(122, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
(547, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
Current size of match and non-match training data sets: 25 / 60
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 122 weight vectors
- Estimated match proportion 0.294
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 122 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)500_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 500), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)500_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)941_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985915
recall 0.234114
f-measure 0.378378
da 71
dm 0
ndm 0
tp 70
fp 1
tn 4.76529e+07
fn 229
Name: (10, 1 - acm diverg, 941), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)941_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 870
Number of entity ID pairs that occurred more than once: 0
Analyse set of 870 weight vectors
Containing 186 true matches and 684 true non-matches
(21.38% true matches)
Identified 830 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 796 (95.90%)
2 : 31 (3.73%)
3 : 2 (0.24%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 830 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.000 : 664
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 870
Number of unique weight vectors: 830
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(830, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 830 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 830 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 744 weight vectors
Based on 28 matches and 58 non-matches
Classified 157 matches and 587 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(157, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(587, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 587 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 587 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.857, 0.417, 0.750, 0.500, 0.455] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 0 matches and 74 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
71.0
Analisando o arquivo: diverg(10)603_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976378
recall 0.414716
f-measure 0.58216
da 127
dm 0
ndm 0
tp 124
fp 3
tn 4.76529e+07
fn 175
Name: (10, 1 - acm diverg, 603), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)603_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 634
Number of entity ID pairs that occurred more than once: 0
Analyse set of 634 weight vectors
Containing 137 true matches and 497 true non-matches
(21.61% true matches)
Identified 618 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 607 (98.22%)
2 : 8 (1.29%)
3 : 2 (0.32%)
5 : 1 (0.16%)
Identified 0 non-pure unique weight vectors (from 618 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 123
0.000 : 495
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 634
Number of unique weight vectors: 618
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(618, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 618 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 618 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 33 matches and 50 non-matches
Purity of oracle classification: 0.602
Entropy of oracle classification: 0.970
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 535 weight vectors
Based on 33 matches and 50 non-matches
Classified 227 matches and 308 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(227, 0.6024096385542169, 0.9695235828220428, 0.39759036144578314)
(308, 0.6024096385542169, 0.9695235828220428, 0.39759036144578314)
Current size of match and non-match training data sets: 33 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 308 weight vectors
- Estimated match proportion 0.398
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 308 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.462, 0.667, 0.600, 0.389, 0.615] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
127.0
Analisando o arquivo: diverg(20)408_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 408), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)408_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 118 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 118 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)139_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 139), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)139_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)735_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 735), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)735_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 895
Number of entity ID pairs that occurred more than once: 0
Analyse set of 895 weight vectors
Containing 199 true matches and 696 true non-matches
(22.23% true matches)
Identified 844 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 810 (95.97%)
2 : 31 (3.67%)
3 : 2 (0.24%)
17 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 844 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 168
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 675
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 894
Number of unique weight vectors: 844
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(844, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 844 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 844 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 758 weight vectors
Based on 30 matches and 56 non-matches
Classified 189 matches and 569 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(189, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(569, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 569 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 569 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.538, 0.789, 0.353, 0.545, 0.550] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.444, 0.643, 0.421, 0.200, 0.556] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[1.000, 0.000, 0.350, 0.455, 0.625, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.667, 0.444, 0.556, 0.222, 0.143] (False)
[1.000, 0.000, 0.583, 0.389, 0.471, 0.545, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.750, 0.533, 0.294, 0.333, 0.429] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 0 matches and 75 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 75
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(10)597_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985915
recall 0.234114
f-measure 0.378378
da 71
dm 0
ndm 0
tp 70
fp 1
tn 4.76529e+07
fn 229
Name: (10, 1 - acm diverg, 597), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)597_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 492
Number of entity ID pairs that occurred more than once: 0
Analyse set of 492 weight vectors
Containing 177 true matches and 315 true non-matches
(35.98% true matches)
Identified 474 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 462 (97.47%)
2 : 9 (1.90%)
3 : 2 (0.42%)
6 : 1 (0.21%)
Identified 0 non-pure unique weight vectors (from 474 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.000 : 315
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 492
Number of unique weight vectors: 474
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(474, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 474 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 474 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 26 matches and 54 non-matches
Purity of oracle classification: 0.675
Entropy of oracle classification: 0.910
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 394 weight vectors
Based on 26 matches and 54 non-matches
Classified 144 matches and 250 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.675, 0.9097361225311662, 0.325)
(250, 0.675, 0.9097361225311662, 0.325)
Current size of match and non-match training data sets: 26 / 54
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 250 weight vectors
- Estimated match proportion 0.325
Sample size for this cluster: 63
Farthest first selection of 63 weight vectors from 250 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.533, 0.000, 0.577, 0.783, 0.429, 0.615, 0.478] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.462, 0.609, 0.643, 0.706, 0.786] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.778, 0.577, 0.455, 0.387, 0.357] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.950, 0.000, 0.619, 0.800, 0.478, 0.280, 0.625] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.667, 0.722, 0.353, 0.545, 0.800] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.407, 0.643, 0.667, 0.500, 0.563] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 63 weight vectors
The oracle will correctly classify 63 weight vectors and wrongly classify 0
Classified 1 matches and 62 non-matches
Purity of oracle classification: 0.984
Entropy of oracle classification: 0.118
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 63 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
71.0
Analisando o arquivo: diverg(20)923_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 923), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)923_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1086
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1086 weight vectors
Containing 220 true matches and 866 true non-matches
(20.26% true matches)
Identified 1030 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 994 (96.50%)
2 : 33 (3.20%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1030 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1085
Number of unique weight vectors: 1030
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1030, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1030 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1030 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 27 matches and 61 non-matches
Purity of oracle classification: 0.693
Entropy of oracle classification: 0.889
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 942 weight vectors
Based on 27 matches and 61 non-matches
Classified 142 matches and 800 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6931818181818182, 0.8894663896628687, 0.3068181818181818)
(800, 0.6931818181818182, 0.8894663896628687, 0.3068181818181818)
Current size of match and non-match training data sets: 27 / 61
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 142 weight vectors
- Estimated match proportion 0.307
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)146_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (20, 1 - acm diverg, 146), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)146_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1017
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1017 weight vectors
Containing 197 true matches and 820 true non-matches
(19.37% true matches)
Identified 975 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 940 (96.41%)
2 : 32 (3.28%)
3 : 2 (0.21%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 975 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.000 : 800
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1017
Number of unique weight vectors: 975
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(975, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 975 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 975 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 25 matches and 62 non-matches
Purity of oracle classification: 0.713
Entropy of oracle classification: 0.865
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 888 weight vectors
Based on 25 matches and 62 non-matches
Classified 108 matches and 780 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(108, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
(780, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 25 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 780 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 780 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 14 matches and 57 non-matches
Purity of oracle classification: 0.803
Entropy of oracle classification: 0.716
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)811_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 811), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)811_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 822
Number of entity ID pairs that occurred more than once: 0
Analyse set of 822 weight vectors
Containing 226 true matches and 596 true non-matches
(27.49% true matches)
Identified 765 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 728 (95.16%)
2 : 34 (4.44%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 765 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 575
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 821
Number of unique weight vectors: 765
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(765, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 765 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 765 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 680 weight vectors
Based on 27 matches and 58 non-matches
Classified 153 matches and 527 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(527, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 527 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 527 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 5 matches and 67 non-matches
Purity of oracle classification: 0.931
Entropy of oracle classification: 0.364
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)753_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (20, 1 - acm diverg, 753), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)753_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1025
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1025 weight vectors
Containing 198 true matches and 827 true non-matches
(19.32% true matches)
Identified 983 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 948 (96.44%)
2 : 32 (3.26%)
3 : 2 (0.20%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 983 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 807
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1025
Number of unique weight vectors: 983
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(983, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 983 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 983 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 896 weight vectors
Based on 26 matches and 61 non-matches
Classified 93 matches and 803 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(93, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(803, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 93 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 93 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)231_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 231), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)231_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 769
Number of entity ID pairs that occurred more than once: 0
Analyse set of 769 weight vectors
Containing 196 true matches and 573 true non-matches
(25.49% true matches)
Identified 727 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 692 (95.19%)
2 : 32 (4.40%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 727 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.000 : 553
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 769
Number of unique weight vectors: 727
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(727, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 727 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 727 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 25 matches and 60 non-matches
Purity of oracle classification: 0.706
Entropy of oracle classification: 0.874
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 642 weight vectors
Based on 25 matches and 60 non-matches
Classified 98 matches and 544 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(98, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
(544, 0.7058823529411765, 0.8739810481273578, 0.29411764705882354)
Current size of match and non-match training data sets: 25 / 60
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 98 weight vectors
- Estimated match proportion 0.294
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 98 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.420, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 42 matches and 2 non-matches
Purity of oracle classification: 0.955
Entropy of oracle classification: 0.267
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)450_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981481
recall 0.177258
f-measure 0.300283
da 54
dm 0
ndm 0
tp 53
fp 1
tn 4.76529e+07
fn 246
Name: (15, 1 - acm diverg, 450), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)450_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 524
Number of entity ID pairs that occurred more than once: 0
Analyse set of 524 weight vectors
Containing 210 true matches and 314 true non-matches
(40.08% true matches)
Identified 488 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 471 (96.52%)
2 : 14 (2.87%)
3 : 2 (0.41%)
19 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 488 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 311
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 523
Number of unique weight vectors: 488
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(488, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 488 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 488 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 33 matches and 47 non-matches
Purity of oracle classification: 0.588
Entropy of oracle classification: 0.978
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 47
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 408 weight vectors
Based on 33 matches and 47 non-matches
Classified 138 matches and 270 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(138, 0.5875, 0.9777945702913884, 0.4125)
(270, 0.5875, 0.9777945702913884, 0.4125)
Current size of match and non-match training data sets: 33 / 47
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 270 weight vectors
- Estimated match proportion 0.412
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 270 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.800, 0.636, 0.563, 0.545, 0.722] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 4 matches and 65 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.319
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
54.0
Analisando o arquivo: diverg(10)570_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 570), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)570_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 419
Number of entity ID pairs that occurred more than once: 0
Analyse set of 419 weight vectors
Containing 202 true matches and 217 true non-matches
(48.21% true matches)
Identified 393 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 379 (96.44%)
2 : 11 (2.80%)
3 : 2 (0.51%)
12 : 1 (0.25%)
Identified 1 non-pure unique weight vectors (from 393 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 216
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 418
Number of unique weight vectors: 393
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(393, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 393 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 393 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 40 matches and 37 non-matches
Purity of oracle classification: 0.519
Entropy of oracle classification: 0.999
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 37
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 316 weight vectors
Based on 40 matches and 37 non-matches
Classified 136 matches and 180 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.5194805194805194, 0.9989047442823606, 0.5194805194805194)
(180, 0.5194805194805194, 0.9989047442823606, 0.5194805194805194)
Current size of match and non-match training data sets: 40 / 37
Selected cluster with (queue ordering: random):
- Purity 0.52 and entropy 1.00
- Size 136 weight vectors
- Estimated match proportion 0.519
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 136 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 0.933, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 50 matches and 6 non-matches
Purity of oracle classification: 0.893
Entropy of oracle classification: 0.491
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)636_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985915
recall 0.234114
f-measure 0.378378
da 71
dm 0
ndm 0
tp 70
fp 1
tn 4.76529e+07
fn 229
Name: (10, 1 - acm diverg, 636), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)636_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 845
Number of entity ID pairs that occurred more than once: 0
Analyse set of 845 weight vectors
Containing 186 true matches and 659 true non-matches
(22.01% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 771 (95.78%)
2 : 31 (3.85%)
3 : 2 (0.25%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.000 : 639
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 845
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 29 matches and 57 non-matches
Classified 163 matches and 556 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(163, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(556, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 556 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 556 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 0 matches and 74 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
71.0
Analisando o arquivo: diverg(20)580_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 580), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)580_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)647_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 647), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)647_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 689
Number of entity ID pairs that occurred more than once: 0
Analyse set of 689 weight vectors
Containing 219 true matches and 470 true non-matches
(31.79% true matches)
Identified 656 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 640 (97.56%)
2 : 13 (1.98%)
3 : 2 (0.30%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 656 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 469
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 688
Number of unique weight vectors: 656
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(656, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 656 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 656 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 572 weight vectors
Based on 29 matches and 55 non-matches
Classified 128 matches and 444 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(128, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(444, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 444 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 444 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.409, 0.654, 0.500, 0.516, 0.333] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.500, 0.452, 0.632, 0.714, 0.667] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.786, 0.833, 0.545, 0.478, 0.346] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 16 matches and 57 non-matches
Purity of oracle classification: 0.781
Entropy of oracle classification: 0.759
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)875_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (10, 1 - acm diverg, 875), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)875_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 987
Number of entity ID pairs that occurred more than once: 0
Analyse set of 987 weight vectors
Containing 169 true matches and 818 true non-matches
(17.12% true matches)
Identified 950 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 919 (96.74%)
2 : 28 (2.95%)
3 : 2 (0.21%)
6 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 950 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 152
0.000 : 798
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 987
Number of unique weight vectors: 950
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(950, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 950 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 950 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 25 matches and 62 non-matches
Purity of oracle classification: 0.713
Entropy of oracle classification: 0.865
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 863 weight vectors
Based on 25 matches and 62 non-matches
Classified 95 matches and 768 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(95, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
(768, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 25 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 95 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 95 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(20)143_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 143), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)143_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 30 matches and 58 non-matches
Classified 159 matches and 780 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(780, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 159 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 159 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)778_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 778), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)778_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 863
Number of entity ID pairs that occurred more than once: 0
Analyse set of 863 weight vectors
Containing 156 true matches and 707 true non-matches
(18.08% true matches)
Identified 827 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 799 (96.61%)
2 : 25 (3.02%)
3 : 2 (0.24%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 827 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 140
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 686
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 855
Number of unique weight vectors: 826
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(826, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 826 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 826 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 740 weight vectors
Based on 29 matches and 57 non-matches
Classified 120 matches and 620 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(120, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(620, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 120 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 120 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.952, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 40 matches and 10 non-matches
Purity of oracle classification: 0.800
Entropy of oracle classification: 0.722
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 10
Number of false non-matches: 0
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)741_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.197324
f-measure 0.329609
da 59
dm 0
ndm 0
tp 59
fp 0
tn 4.76529e+07
fn 240
Name: (10, 1 - acm diverg, 741), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)741_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 641
Number of entity ID pairs that occurred more than once: 0
Analyse set of 641 weight vectors
Containing 190 true matches and 451 true non-matches
(29.64% true matches)
Identified 596 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 562 (94.30%)
2 : 31 (5.20%)
3 : 2 (0.34%)
11 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 596 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 430
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 640
Number of unique weight vectors: 596
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(596, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 596 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 596 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 26 matches and 56 non-matches
Purity of oracle classification: 0.683
Entropy of oracle classification: 0.901
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 514 weight vectors
Based on 26 matches and 56 non-matches
Classified 188 matches and 326 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(188, 0.6829268292682927, 0.9011701959974223, 0.3170731707317073)
(326, 0.6829268292682927, 0.9011701959974223, 0.3170731707317073)
Current size of match and non-match training data sets: 26 / 56
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 326 weight vectors
- Estimated match proportion 0.317
Sample size for this cluster: 66
Farthest first selection of 66 weight vectors from 326 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.269, 0.478, 0.750, 0.385, 0.455] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[1.000, 0.000, 0.538, 0.600, 0.471, 0.632, 0.688] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.500, 0.571, 0.467, 0.467, 0.389] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.179, 0.500, 0.412, 0.357] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.800, 0.667, 0.381, 0.550, 0.429] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.571, 0.286, 0.333, 0.571, 0.600] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.583, 0.786, 0.842, 0.800, 0.833] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 66 weight vectors
The oracle will correctly classify 66 weight vectors and wrongly classify 0
Classified 0 matches and 66 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 66 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
59.0
Analisando o arquivo: diverg(15)758_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (15, 1 - acm diverg, 758), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)758_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 581
Number of entity ID pairs that occurred more than once: 0
Analyse set of 581 weight vectors
Containing 187 true matches and 394 true non-matches
(32.19% true matches)
Identified 559 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 543 (97.14%)
2 : 13 (2.33%)
3 : 2 (0.36%)
6 : 1 (0.18%)
Identified 0 non-pure unique weight vectors (from 559 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.000 : 392
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 581
Number of unique weight vectors: 559
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(559, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 559 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 559 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 29 matches and 53 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.937
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 477 weight vectors
Based on 29 matches and 53 non-matches
Classified 141 matches and 336 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
(336, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
Current size of match and non-match training data sets: 29 / 53
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 141 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 141 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 47 matches and 7 non-matches
Purity of oracle classification: 0.870
Entropy of oracle classification: 0.556
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)997_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 997), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)997_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 28 matches and 60 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 28 matches and 60 non-matches
Classified 156 matches and 800 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
(800, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
Current size of match and non-match training data sets: 28 / 60
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 156 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 156 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)201_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 201), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)201_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)812_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.197324
f-measure 0.329609
da 59
dm 0
ndm 0
tp 59
fp 0
tn 4.76529e+07
fn 240
Name: (10, 1 - acm diverg, 812), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)812_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 893
Number of entity ID pairs that occurred more than once: 0
Analyse set of 893 weight vectors
Containing 177 true matches and 716 true non-matches
(19.82% true matches)
Identified 848 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 814 (95.99%)
2 : 31 (3.66%)
3 : 2 (0.24%)
11 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 848 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 152
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 695
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 892
Number of unique weight vectors: 848
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(848, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 848 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 848 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 762 weight vectors
Based on 28 matches and 58 non-matches
Classified 172 matches and 590 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(172, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(590, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 590 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 590 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.783, 0.357, 0.750, 0.412, 0.238] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.500, 0.600, 0.353, 0.611, 0.526] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 0 matches and 74 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
59.0
Analisando o arquivo: diverg(15)292_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (15, 1 - acm diverg, 292), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)292_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 597
Number of entity ID pairs that occurred more than once: 0
Analyse set of 597 weight vectors
Containing 201 true matches and 396 true non-matches
(33.67% true matches)
Identified 566 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 550 (97.17%)
2 : 13 (2.30%)
3 : 2 (0.35%)
15 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 566 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 393
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 596
Number of unique weight vectors: 566
Time to load and analyse the weight vector file: 0.04 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(566, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 566 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 31 matches and 51 non-matches
Purity of oracle classification: 0.622
Entropy of oracle classification: 0.957
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 484 weight vectors
Based on 31 matches and 51 non-matches
Classified 144 matches and 340 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
(340, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
Current size of match and non-match training data sets: 31 / 51
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 144 weight vectors
- Estimated match proportion 0.378
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 144 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 49 matches and 7 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)605_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 605), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)605_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 793
Number of entity ID pairs that occurred more than once: 0
Analyse set of 793 weight vectors
Containing 223 true matches and 570 true non-matches
(28.12% true matches)
Identified 754 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 735 (97.48%)
2 : 16 (2.12%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 754 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 567
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 792
Number of unique weight vectors: 754
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(754, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 754 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 754 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 32 matches and 53 non-matches
Purity of oracle classification: 0.624
Entropy of oracle classification: 0.956
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 669 weight vectors
Based on 32 matches and 53 non-matches
Classified 149 matches and 520 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
(520, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
Current size of match and non-match training data sets: 32 / 53
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 149 weight vectors
- Estimated match proportion 0.376
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 52 matches and 4 non-matches
Purity of oracle classification: 0.929
Entropy of oracle classification: 0.371
Number of true matches: 52
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)276_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 276), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)276_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 955
Number of entity ID pairs that occurred more than once: 0
Analyse set of 955 weight vectors
Containing 216 true matches and 739 true non-matches
(22.62% true matches)
Identified 900 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 864 (96.00%)
2 : 33 (3.67%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 900 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 718
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 954
Number of unique weight vectors: 900
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(900, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 900 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 900 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 23 matches and 63 non-matches
Purity of oracle classification: 0.733
Entropy of oracle classification: 0.838
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 814 weight vectors
Based on 23 matches and 63 non-matches
Classified 0 matches and 814 non-matches
40.0
Analisando o arquivo: diverg(15)948_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (15, 1 - acm diverg, 948), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)948_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1005
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1005 weight vectors
Containing 195 true matches and 810 true non-matches
(19.40% true matches)
Identified 963 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 928 (96.37%)
2 : 32 (3.32%)
3 : 2 (0.21%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 963 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.000 : 790
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1005
Number of unique weight vectors: 963
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(963, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 963 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 963 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 876 weight vectors
Based on 28 matches and 59 non-matches
Classified 142 matches and 734 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(734, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 142 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 48 matches and 5 non-matches
Purity of oracle classification: 0.906
Entropy of oracle classification: 0.451
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(15)740_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 740), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)740_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 953
Number of entity ID pairs that occurred more than once: 0
Analyse set of 953 weight vectors
Containing 201 true matches and 752 true non-matches
(21.09% true matches)
Identified 908 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 874 (96.26%)
2 : 31 (3.41%)
3 : 2 (0.22%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 908 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 952
Number of unique weight vectors: 908
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(908, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 908 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 908 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 25 matches and 62 non-matches
Purity of oracle classification: 0.713
Entropy of oracle classification: 0.865
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 821 weight vectors
Based on 25 matches and 62 non-matches
Classified 110 matches and 711 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(110, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
(711, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 25 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 711 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 711 vectors
The selected farthest weight vectors are:
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 13 matches and 58 non-matches
Purity of oracle classification: 0.817
Entropy of oracle classification: 0.687
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)830_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (20, 1 - acm diverg, 830), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)830_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1041
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1041 weight vectors
Containing 213 true matches and 828 true non-matches
(20.46% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 954 (96.46%)
2 : 32 (3.24%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 807
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1040
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 26 matches and 61 non-matches
Classified 109 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 109 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)737_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 737), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)737_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 24 matches and 64 non-matches
Purity of oracle classification: 0.727
Entropy of oracle classification: 0.845
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 24 matches and 64 non-matches
Classified 91 matches and 857 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(91, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
(857, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
Current size of match and non-match training data sets: 24 / 64
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.85
- Size 857 weight vectors
- Estimated match proportion 0.273
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 857 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 18 matches and 52 non-matches
Purity of oracle classification: 0.743
Entropy of oracle classification: 0.822
Number of true matches: 18
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)91_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 91), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)91_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1031
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1031 weight vectors
Containing 212 true matches and 819 true non-matches
(20.56% true matches)
Identified 979 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 944 (96.42%)
2 : 32 (3.27%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 979 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 798
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1030
Number of unique weight vectors: 979
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(979, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 979 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 979 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 892 weight vectors
Based on 26 matches and 61 non-matches
Classified 136 matches and 756 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(756, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 756 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 756 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 10 matches and 63 non-matches
Purity of oracle classification: 0.863
Entropy of oracle classification: 0.576
Number of true matches: 10
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)581_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 581), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)581_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)339_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 339), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)339_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 810
Number of entity ID pairs that occurred more than once: 0
Analyse set of 810 weight vectors
Containing 223 true matches and 587 true non-matches
(27.53% true matches)
Identified 756 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 719 (95.11%)
2 : 34 (4.50%)
3 : 2 (0.26%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 756 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 566
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 809
Number of unique weight vectors: 756
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(756, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 756 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 756 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 671 weight vectors
Based on 27 matches and 58 non-matches
Classified 94 matches and 577 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(577, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 94 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 94 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.950, 0.923, 0.941] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 44 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)841_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 841), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)841_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)497_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 497), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)497_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 226 true matches and 857 true non-matches
(20.87% true matches)
Identified 1026 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 989 (96.39%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1026 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1026
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1026, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1026 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1026 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 28 matches and 60 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 938 weight vectors
Based on 28 matches and 60 non-matches
Classified 159 matches and 779 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
(779, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
Current size of match and non-match training data sets: 28 / 60
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 779 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 779 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 3 matches and 72 non-matches
Purity of oracle classification: 0.960
Entropy of oracle classification: 0.242
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)639_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 639), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)639_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 829
Number of entity ID pairs that occurred more than once: 0
Analyse set of 829 weight vectors
Containing 214 true matches and 615 true non-matches
(25.81% true matches)
Identified 775 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 740 (95.48%)
2 : 32 (4.13%)
3 : 2 (0.26%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 775 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 594
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 828
Number of unique weight vectors: 775
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(775, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 775 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 775 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 690 weight vectors
Based on 29 matches and 56 non-matches
Classified 150 matches and 540 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(540, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 150 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 150 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 47 matches and 8 non-matches
Purity of oracle classification: 0.855
Entropy of oracle classification: 0.598
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)548_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 548), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)548_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 29 matches and 56 non-matches
Classified 146 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(538, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 146 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 146 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 50 matches and 4 non-matches
Purity of oracle classification: 0.926
Entropy of oracle classification: 0.381
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)606_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 606), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)606_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 831
Number of entity ID pairs that occurred more than once: 0
Analyse set of 831 weight vectors
Containing 227 true matches and 604 true non-matches
(27.32% true matches)
Identified 774 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 737 (95.22%)
2 : 34 (4.39%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 774 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 583
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 830
Number of unique weight vectors: 774
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(774, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 774 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 774 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 689 weight vectors
Based on 28 matches and 57 non-matches
Classified 151 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(538, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 538 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 538 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 9 matches and 64 non-matches
Purity of oracle classification: 0.877
Entropy of oracle classification: 0.539
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)3_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 3), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)3_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 744
Number of entity ID pairs that occurred more than once: 0
Analyse set of 744 weight vectors
Containing 197 true matches and 547 true non-matches
(26.48% true matches)
Identified 702 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 667 (95.01%)
2 : 32 (4.56%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 702 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.000 : 527
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 744
Number of unique weight vectors: 702
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(702, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 702 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 702 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 618 weight vectors
Based on 27 matches and 57 non-matches
Classified 143 matches and 475 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(143, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(475, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 475 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 475 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.714, 0.727, 0.750, 0.294, 0.833] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 4 matches and 67 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.313
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)634_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 634), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)634_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 789
Number of entity ID pairs that occurred more than once: 0
Analyse set of 789 weight vectors
Containing 225 true matches and 564 true non-matches
(28.52% true matches)
Identified 750 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 731 (97.47%)
2 : 16 (2.13%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 750 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 561
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 788
Number of unique weight vectors: 750
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(750, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 750 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 750 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.429, 0.786, 0.750, 0.389, 0.857] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 32 matches and 53 non-matches
Purity of oracle classification: 0.624
Entropy of oracle classification: 0.956
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 665 weight vectors
Based on 32 matches and 53 non-matches
Classified 161 matches and 504 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(161, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
(504, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
Current size of match and non-match training data sets: 32 / 53
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 504 weight vectors
- Estimated match proportion 0.376
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 504 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.556, 0.429, 0.500, 0.700, 0.643] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.333, 0.667, 0.400, 0.583, 0.563] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 3 matches and 73 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.240
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)606_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 606), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)606_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 770
Number of entity ID pairs that occurred more than once: 0
Analyse set of 770 weight vectors
Containing 212 true matches and 558 true non-matches
(27.53% true matches)
Identified 718 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 683 (95.13%)
2 : 32 (4.46%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 718 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 537
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 769
Number of unique weight vectors: 718
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(718, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 718 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 718 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 634 weight vectors
Based on 31 matches and 53 non-matches
Classified 293 matches and 341 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(293, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(341, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 293 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 293 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.600, 1.000, 0.217, 0.132, 0.167, 0.125, 0.188] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 43 matches and 25 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)84_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 84), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)84_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1018
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1018 weight vectors
Containing 220 true matches and 798 true non-matches
(21.61% true matches)
Identified 964 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 927 (96.16%)
2 : 34 (3.53%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 964 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 777
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1017
Number of unique weight vectors: 964
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(964, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 964 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 964 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.800, 0.000, 0.444, 0.545, 0.333, 0.111, 0.533] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 33 matches and 54 non-matches
Purity of oracle classification: 0.621
Entropy of oracle classification: 0.958
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 877 weight vectors
Based on 33 matches and 54 non-matches
Classified 298 matches and 579 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(298, 0.6206896551724138, 0.9575534837147482, 0.3793103448275862)
(579, 0.6206896551724138, 0.9575534837147482, 0.3793103448275862)
Current size of match and non-match training data sets: 33 / 54
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 298 weight vectors
- Estimated match proportion 0.379
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 298 vectors
The selected farthest weight vectors are:
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.261, 0.174, 0.148, 0.186, 0.148] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 43 matches and 26 non-matches
Purity of oracle classification: 0.623
Entropy of oracle classification: 0.956
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)400_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.99
recall 0.331104
f-measure 0.496241
da 100
dm 0
ndm 0
tp 99
fp 1
tn 4.76529e+07
fn 200
Name: (15, 1 - acm diverg, 400), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)400_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 997
Number of entity ID pairs that occurred more than once: 0
Analyse set of 997 weight vectors
Containing 167 true matches and 830 true non-matches
(16.75% true matches)
Identified 958 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 929 (96.97%)
2 : 26 (2.71%)
3 : 2 (0.21%)
10 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 958 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 809
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 996
Number of unique weight vectors: 958
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(958, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 958 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 958 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 871 weight vectors
Based on 31 matches and 56 non-matches
Classified 280 matches and 591 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(280, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(591, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 591 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 591 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.667, 0.538, 0.455, 0.581, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 0 matches and 76 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 76
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(10)357_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985915
recall 0.234114
f-measure 0.378378
da 71
dm 0
ndm 0
tp 70
fp 1
tn 4.76529e+07
fn 229
Name: (10, 1 - acm diverg, 357), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)357_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 479
Number of entity ID pairs that occurred more than once: 0
Analyse set of 479 weight vectors
Containing 175 true matches and 304 true non-matches
(36.53% true matches)
Identified 461 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 449 (97.40%)
2 : 9 (1.95%)
3 : 2 (0.43%)
6 : 1 (0.22%)
Identified 0 non-pure unique weight vectors (from 461 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 157
0.000 : 304
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 479
Number of unique weight vectors: 461
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(461, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 461 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 461 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.364, 0.619, 0.471, 0.600, 0.533] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 28 matches and 51 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.938
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 382 weight vectors
Based on 28 matches and 51 non-matches
Classified 131 matches and 251 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.6455696202531646, 0.9379626436434423, 0.35443037974683544)
(251, 0.6455696202531646, 0.9379626436434423, 0.35443037974683544)
Current size of match and non-match training data sets: 28 / 51
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 251 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 65
Farthest first selection of 65 weight vectors from 251 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.579, 0.583, 0.522, 0.417, 0.563] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.923, 0.667, 0.667, 0.412, 0.571] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.500, 0.452, 0.632, 0.714, 0.667] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.750, 0.714, 0.500, 0.412, 0.762] (False)
[1.000, 0.000, 0.565, 0.857, 0.833, 0.412, 0.667] (False)
[1.000, 0.000, 0.846, 0.684, 0.529, 0.727, 0.700] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.591, 0.762, 0.647, 0.636, 0.550] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.333, 0.667, 0.400, 0.583, 0.563] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 2 matches and 63 non-matches
Purity of oracle classification: 0.969
Entropy of oracle classification: 0.198
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
71.0
Analisando o arquivo: diverg(15)344_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 344), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)344_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 744
Number of entity ID pairs that occurred more than once: 0
Analyse set of 744 weight vectors
Containing 220 true matches and 524 true non-matches
(29.57% true matches)
Identified 708 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 689 (97.32%)
2 : 16 (2.26%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 708 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 521
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 743
Number of unique weight vectors: 708
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(708, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 708 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 708 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 624 weight vectors
Based on 29 matches and 55 non-matches
Classified 151 matches and 473 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(473, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 151 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)887_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 887), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)887_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1082
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1082 weight vectors
Containing 209 true matches and 873 true non-matches
(19.32% true matches)
Identified 1035 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1000 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1035 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1081
Number of unique weight vectors: 1035
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1035, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1035 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1035 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 947 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 846 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(846, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 846 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 846 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)298_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (10, 1 - acm diverg, 298), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)298_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 480
Number of entity ID pairs that occurred more than once: 0
Analyse set of 480 weight vectors
Containing 154 true matches and 326 true non-matches
(32.08% true matches)
Identified 467 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 458 (98.07%)
2 : 6 (1.28%)
3 : 2 (0.43%)
4 : 1 (0.21%)
Identified 0 non-pure unique weight vectors (from 467 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 141
0.000 : 326
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 480
Number of unique weight vectors: 467
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(467, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 467 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 467 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 26 matches and 53 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 388 weight vectors
Based on 26 matches and 53 non-matches
Classified 109 matches and 279 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.6708860759493671, 0.9140185106642176, 0.3291139240506329)
(279, 0.6708860759493671, 0.9140185106642176, 0.3291139240506329)
Current size of match and non-match training data sets: 26 / 53
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 109 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 45 matches and 3 non-matches
Purity of oracle classification: 0.938
Entropy of oracle classification: 0.337
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(15)794_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 794), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)794_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 742
Number of entity ID pairs that occurred more than once: 0
Analyse set of 742 weight vectors
Containing 220 true matches and 522 true non-matches
(29.65% true matches)
Identified 706 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 687 (97.31%)
2 : 16 (2.27%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 706 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 519
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 741
Number of unique weight vectors: 706
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(706, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 706 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 622 weight vectors
Based on 29 matches and 55 non-matches
Classified 151 matches and 471 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(471, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 471 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 471 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.870, 0.619, 0.643, 0.700, 0.524] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 6 matches and 67 non-matches
Purity of oracle classification: 0.918
Entropy of oracle classification: 0.410
Number of true matches: 6
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)333_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 333), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)333_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 804
Number of entity ID pairs that occurred more than once: 0
Analyse set of 804 weight vectors
Containing 208 true matches and 596 true non-matches
(25.87% true matches)
Identified 757 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 722 (95.38%)
2 : 32 (4.23%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 757 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 575
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 803
Number of unique weight vectors: 757
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(757, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 757 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 757 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 672 weight vectors
Based on 26 matches and 59 non-matches
Classified 139 matches and 533 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(139, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(533, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 139 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 139 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)304_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 304), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)304_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)58_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 58), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)58_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 961
Number of entity ID pairs that occurred more than once: 0
Analyse set of 961 weight vectors
Containing 217 true matches and 744 true non-matches
(22.58% true matches)
Identified 906 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 870 (96.03%)
2 : 33 (3.64%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 906 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 723
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 960
Number of unique weight vectors: 906
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(906, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 906 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 906 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 819 weight vectors
Based on 26 matches and 61 non-matches
Classified 135 matches and 684 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(684, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 135 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 135 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 50 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.139
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)245_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 245), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)245_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 29 matches and 56 non-matches
Classified 146 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(538, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 146 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 146 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 50 matches and 4 non-matches
Purity of oracle classification: 0.926
Entropy of oracle classification: 0.381
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)963_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 963), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)963_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 690
Number of entity ID pairs that occurred more than once: 0
Analyse set of 690 weight vectors
Containing 217 true matches and 473 true non-matches
(31.45% true matches)
Identified 635 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 599 (94.33%)
2 : 33 (5.20%)
3 : 2 (0.31%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 635 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 452
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 689
Number of unique weight vectors: 635
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(635, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 635 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 635 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 24 matches and 59 non-matches
Purity of oracle classification: 0.711
Entropy of oracle classification: 0.868
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 552 weight vectors
Based on 24 matches and 59 non-matches
Classified 46 matches and 506 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(46, 0.7108433734939759, 0.8676293117125106, 0.2891566265060241)
(506, 0.7108433734939759, 0.8676293117125106, 0.2891566265060241)
Current size of match and non-match training data sets: 24 / 59
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 506 weight vectors
- Estimated match proportion 0.289
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 506 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.423, 0.478, 0.500, 0.813, 0.545] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 22 matches and 46 non-matches
Purity of oracle classification: 0.676
Entropy of oracle classification: 0.908
Number of true matches: 22
Number of false matches: 0
Number of true non-matches: 46
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)31_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 31), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)31_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 566 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)316_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 316), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)316_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 739
Number of entity ID pairs that occurred more than once: 0
Analyse set of 739 weight vectors
Containing 212 true matches and 527 true non-matches
(28.69% true matches)
Identified 687 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 652 (94.91%)
2 : 32 (4.66%)
3 : 2 (0.29%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 687 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 506
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 738
Number of unique weight vectors: 687
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(687, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 687 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 687 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 30 matches and 54 non-matches
Purity of oracle classification: 0.643
Entropy of oracle classification: 0.940
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 603 weight vectors
Based on 30 matches and 54 non-matches
Classified 291 matches and 312 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(291, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
(312, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
Current size of match and non-match training data sets: 30 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 291 weight vectors
- Estimated match proportion 0.357
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 291 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 44 matches and 24 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 24
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)60_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 60), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)60_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 221 true matches and 855 true non-matches
(20.54% true matches)
Identified 1020 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 984 (96.47%)
2 : 33 (3.24%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1020 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 834
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1020
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1020, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1020 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1020 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 933 weight vectors
Based on 30 matches and 57 non-matches
Classified 170 matches and 763 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(170, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(763, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 170 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 170 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 43 matches and 15 non-matches
Purity of oracle classification: 0.741
Entropy of oracle classification: 0.825
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 15
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)822_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 822), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)822_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 786
Number of entity ID pairs that occurred more than once: 0
Analyse set of 786 weight vectors
Containing 208 true matches and 578 true non-matches
(26.46% true matches)
Identified 757 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 740 (97.75%)
2 : 14 (1.85%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 757 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 575
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 785
Number of unique weight vectors: 757
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(757, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 757 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 757 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 672 weight vectors
Based on 28 matches and 57 non-matches
Classified 131 matches and 541 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(541, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 541 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 541 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 12 matches and 61 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.645
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)299_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (15, 1 - acm diverg, 299), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)299_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 616
Number of entity ID pairs that occurred more than once: 0
Analyse set of 616 weight vectors
Containing 201 true matches and 415 true non-matches
(32.63% true matches)
Identified 582 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 566 (97.25%)
2 : 13 (2.23%)
3 : 2 (0.34%)
18 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 582 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 412
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 615
Number of unique weight vectors: 582
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(582, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 582 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 582 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 31 matches and 51 non-matches
Purity of oracle classification: 0.622
Entropy of oracle classification: 0.957
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 500 weight vectors
Based on 31 matches and 51 non-matches
Classified 142 matches and 358 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
(358, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
Current size of match and non-match training data sets: 31 / 51
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 142 weight vectors
- Estimated match proportion 0.378
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 51 matches and 4 non-matches
Purity of oracle classification: 0.927
Entropy of oracle classification: 0.376
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(15)821_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 821), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)821_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 799
Number of entity ID pairs that occurred more than once: 0
Analyse set of 799 weight vectors
Containing 213 true matches and 586 true non-matches
(26.66% true matches)
Identified 747 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 712 (95.31%)
2 : 32 (4.28%)
3 : 2 (0.27%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 747 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 565
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 798
Number of unique weight vectors: 747
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(747, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 747 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 747 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 662 weight vectors
Based on 27 matches and 58 non-matches
Classified 86 matches and 576 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(86, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(576, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 86 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 86 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.956, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.950, 0.923, 0.941] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 43 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)925_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976744
recall 0.421405
f-measure 0.588785
da 129
dm 0
ndm 0
tp 126
fp 3
tn 4.76529e+07
fn 173
Name: (15, 1 - acm diverg, 925), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)925_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 946
Number of entity ID pairs that occurred more than once: 0
Analyse set of 946 weight vectors
Containing 138 true matches and 808 true non-matches
(14.59% true matches)
Identified 912 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 883 (96.82%)
2 : 26 (2.85%)
3 : 2 (0.22%)
5 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 912 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 124
0.000 : 788
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 946
Number of unique weight vectors: 912
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(912, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 912 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 912 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 825 weight vectors
Based on 27 matches and 60 non-matches
Classified 89 matches and 736 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(89, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(736, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 89 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 89 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 38 matches and 5 non-matches
Purity of oracle classification: 0.884
Entropy of oracle classification: 0.519
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
129.0
Analisando o arquivo: diverg(10)195_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 195), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)195_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 848
Number of entity ID pairs that occurred more than once: 0
Analyse set of 848 weight vectors
Containing 189 true matches and 659 true non-matches
(22.29% true matches)
Identified 808 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 774 (95.79%)
2 : 31 (3.84%)
3 : 2 (0.25%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 808 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.000 : 639
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 848
Number of unique weight vectors: 808
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(808, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 808 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 808 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 722 weight vectors
Based on 30 matches and 56 non-matches
Classified 168 matches and 554 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(168, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(554, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 554 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 554 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 0 matches and 75 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 75
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(15)802_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 802), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)802_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 506
Number of entity ID pairs that occurred more than once: 0
Analyse set of 506 weight vectors
Containing 205 true matches and 301 true non-matches
(40.51% true matches)
Identified 477 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 460 (96.44%)
2 : 14 (2.94%)
3 : 2 (0.42%)
12 : 1 (0.21%)
Identified 1 non-pure unique weight vectors (from 477 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 298
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 505
Number of unique weight vectors: 477
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(477, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 477 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 477 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 36 matches and 44 non-matches
Purity of oracle classification: 0.550
Entropy of oracle classification: 0.993
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 397 weight vectors
Based on 36 matches and 44 non-matches
Classified 281 matches and 116 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(281, 0.55, 0.9927744539878084, 0.45)
(116, 0.55, 0.9927744539878084, 0.45)
Current size of match and non-match training data sets: 36 / 44
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 116 weight vectors
- Estimated match proportion 0.450
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 116 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.619, 1.000, 0.103, 0.163, 0.129, 0.146, 0.213] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 1 matches and 51 non-matches
Purity of oracle classification: 0.981
Entropy of oracle classification: 0.137
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)740_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 740), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)740_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 779
Number of entity ID pairs that occurred more than once: 0
Analyse set of 779 weight vectors
Containing 222 true matches and 557 true non-matches
(28.50% true matches)
Identified 725 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 688 (94.90%)
2 : 34 (4.69%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 725 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 536
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 778
Number of unique weight vectors: 725
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(725, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 725 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 725 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 32 matches and 53 non-matches
Purity of oracle classification: 0.624
Entropy of oracle classification: 0.956
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 640 weight vectors
Based on 32 matches and 53 non-matches
Classified 300 matches and 340 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(300, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
(340, 0.6235294117647059, 0.9555111232924128, 0.3764705882352941)
Current size of match and non-match training data sets: 32 / 53
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 300 weight vectors
- Estimated match proportion 0.376
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 300 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.600, 1.000, 0.217, 0.132, 0.167, 0.125, 0.188] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 44 matches and 25 non-matches
Purity of oracle classification: 0.638
Entropy of oracle classification: 0.945
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)944_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 944), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)944_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 226 true matches and 857 true non-matches
(20.87% true matches)
Identified 1026 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 989 (96.39%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1026 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1026
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1026, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1026 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1026 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 938 weight vectors
Based on 30 matches and 58 non-matches
Classified 177 matches and 761 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(177, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(761, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 761 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 761 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)84_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 84), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)84_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 801
Number of entity ID pairs that occurred more than once: 0
Analyse set of 801 weight vectors
Containing 220 true matches and 581 true non-matches
(27.47% true matches)
Identified 763 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 745 (97.64%)
2 : 15 (1.97%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 763 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 578
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 800
Number of unique weight vectors: 763
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(763, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 763 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 763 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 678 weight vectors
Based on 28 matches and 57 non-matches
Classified 135 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 135 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 135 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)473_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 473), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)473_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 945
Number of entity ID pairs that occurred more than once: 0
Analyse set of 945 weight vectors
Containing 219 true matches and 726 true non-matches
(23.17% true matches)
Identified 890 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 854 (95.96%)
2 : 33 (3.71%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 890 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 705
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 944
Number of unique weight vectors: 890
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(890, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 890 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 890 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 804 weight vectors
Based on 24 matches and 62 non-matches
Classified 130 matches and 674 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(130, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(674, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 130 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 130 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)850_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.197324
f-measure 0.329609
da 59
dm 0
ndm 0
tp 59
fp 0
tn 4.76529e+07
fn 240
Name: (10, 1 - acm diverg, 850), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)850_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 661
Number of entity ID pairs that occurred more than once: 0
Analyse set of 661 weight vectors
Containing 198 true matches and 463 true non-matches
(29.95% true matches)
Identified 616 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 582 (94.48%)
2 : 31 (5.03%)
3 : 2 (0.32%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 616 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 442
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 660
Number of unique weight vectors: 616
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(616, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 616 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 616 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 533 weight vectors
Based on 28 matches and 55 non-matches
Classified 157 matches and 376 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(157, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(376, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 376 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 376 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.750, 0.524, 0.400, 0.813, 0.611] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.600, 0.857, 0.579, 0.286, 0.545] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.417, 0.750, 0.500, 0.455] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.857, 0.444, 0.556, 0.235, 0.500] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.714, 0.318, 0.583, 0.417, 0.727] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 1 matches and 69 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.108
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
59.0
Analisando o arquivo: diverg(20)845_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 845), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)845_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)843_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 843), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)843_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 932
Number of entity ID pairs that occurred more than once: 0
Analyse set of 932 weight vectors
Containing 200 true matches and 732 true non-matches
(21.46% true matches)
Identified 887 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 853 (96.17%)
2 : 31 (3.49%)
3 : 2 (0.23%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 887 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 711
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 931
Number of unique weight vectors: 887
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(887, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 887 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 887 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 801 weight vectors
Based on 28 matches and 58 non-matches
Classified 158 matches and 643 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(158, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(643, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 643 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 643 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 1 matches and 73 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.103
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)7_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.982759
recall 0.190635
f-measure 0.319328
da 58
dm 0
ndm 0
tp 57
fp 1
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 7), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)7_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 932
Number of entity ID pairs that occurred more than once: 0
Analyse set of 932 weight vectors
Containing 200 true matches and 732 true non-matches
(21.46% true matches)
Identified 881 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 847 (96.14%)
2 : 31 (3.52%)
3 : 2 (0.23%)
17 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 881 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 711
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 931
Number of unique weight vectors: 881
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(881, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 881 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 881 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 795 weight vectors
Based on 28 matches and 58 non-matches
Classified 140 matches and 655 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(655, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 140 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 140 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 48 matches and 5 non-matches
Purity of oracle classification: 0.906
Entropy of oracle classification: 0.451
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)777_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 777), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)777_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 697
Number of entity ID pairs that occurred more than once: 0
Analyse set of 697 weight vectors
Containing 198 true matches and 499 true non-matches
(28.41% true matches)
Identified 652 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 618 (94.79%)
2 : 31 (4.75%)
3 : 2 (0.31%)
11 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 652 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 478
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 696
Number of unique weight vectors: 652
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(652, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 652 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 652 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 25 matches and 58 non-matches
Purity of oracle classification: 0.699
Entropy of oracle classification: 0.883
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 569 weight vectors
Based on 25 matches and 58 non-matches
Classified 143 matches and 426 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(143, 0.6987951807228916, 0.8827586787955115, 0.30120481927710846)
(426, 0.6987951807228916, 0.8827586787955115, 0.30120481927710846)
Current size of match and non-match training data sets: 25 / 58
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 426 weight vectors
- Estimated match proportion 0.301
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 426 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.423, 0.478, 0.500, 0.813, 0.545] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.269, 0.478, 0.750, 0.385, 0.455] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.636, 0.429, 0.632, 0.250, 0.750] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.947, 1.000, 0.292, 0.178, 0.227, 0.122, 0.154] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 8 matches and 60 non-matches
Purity of oracle classification: 0.882
Entropy of oracle classification: 0.523
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)613_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 613), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)613_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 662
Number of entity ID pairs that occurred more than once: 0
Analyse set of 662 weight vectors
Containing 217 true matches and 445 true non-matches
(32.78% true matches)
Identified 629 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 613 (97.46%)
2 : 13 (2.07%)
3 : 2 (0.32%)
17 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 629 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 444
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 661
Number of unique weight vectors: 629
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(629, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 629 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 629 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 33 matches and 50 non-matches
Purity of oracle classification: 0.602
Entropy of oracle classification: 0.970
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 546 weight vectors
Based on 33 matches and 50 non-matches
Classified 176 matches and 370 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(176, 0.6024096385542169, 0.9695235828220428, 0.39759036144578314)
(370, 0.6024096385542169, 0.9695235828220428, 0.39759036144578314)
Current size of match and non-match training data sets: 33 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 176 weight vectors
- Estimated match proportion 0.398
Sample size for this cluster: 61
Farthest first selection of 61 weight vectors from 176 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.890, 1.000, 0.281, 0.136, 0.183, 0.250, 0.163] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 61 weight vectors
The oracle will correctly classify 61 weight vectors and wrongly classify 0
Classified 44 matches and 17 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 17
Number of false non-matches: 0
Deleted 61 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)137_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 137), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)137_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)544_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 544), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)544_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)147_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 147), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)147_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 794
Number of entity ID pairs that occurred more than once: 0
Analyse set of 794 weight vectors
Containing 221 true matches and 573 true non-matches
(27.83% true matches)
Identified 740 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 703 (95.00%)
2 : 34 (4.59%)
3 : 2 (0.27%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 740 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 552
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 793
Number of unique weight vectors: 740
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(740, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 740 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 740 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 655 weight vectors
Based on 28 matches and 57 non-matches
Classified 154 matches and 501 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(154, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(501, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 501 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 501 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.767, 0.667, 0.545, 0.786, 0.773] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 4 matches and 68 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.310
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 68
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)44_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 44), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)44_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 717
Number of entity ID pairs that occurred more than once: 0
Analyse set of 717 weight vectors
Containing 193 true matches and 524 true non-matches
(26.92% true matches)
Identified 675 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 640 (94.81%)
2 : 32 (4.74%)
3 : 2 (0.30%)
7 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 675 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 171
0.000 : 504
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 717
Number of unique weight vectors: 675
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(675, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 675 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 675 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 591 weight vectors
Based on 31 matches and 53 non-matches
Classified 285 matches and 306 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(285, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(306, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 306 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 306 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.833, 0.571, 0.727, 0.647, 0.857] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.917, 0.786, 0.263, 0.500, 0.556] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.833, 0.364, 0.417, 0.800, 0.545] (False)
[0.800, 0.000, 0.625, 0.571, 0.467, 0.474, 0.667] (False)
[1.000, 0.000, 0.636, 0.429, 0.632, 0.250, 0.750] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.333, 0.214, 0.333, 0.588, 0.476] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.385, 0.391, 0.667, 0.579, 0.824] (False)
[1.000, 0.000, 0.750, 0.429, 0.526, 0.500, 0.846] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[1.000, 0.000, 0.067, 0.550, 0.818, 0.727, 0.762] (False)
[1.000, 0.000, 0.556, 0.222, 0.444, 0.429, 0.300] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.500, 0.600, 0.294, 0.600, 0.500] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.000, 0.700, 0.818, 0.444, 0.619] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 0 matches and 69 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(10)293_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 293), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)293_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 343
Number of entity ID pairs that occurred more than once: 0
Analyse set of 343 weight vectors
Containing 191 true matches and 152 true non-matches
(55.69% true matches)
Identified 322 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 308 (95.65%)
2 : 11 (3.42%)
3 : 2 (0.62%)
7 : 1 (0.31%)
Identified 0 non-pure unique weight vectors (from 322 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.000 : 152
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 343
Number of unique weight vectors: 322
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(322, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 322 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 74
Perform initial selection using "far" method
Farthest first selection of 74 weight vectors from 322 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 51 matches and 23 non-matches
Purity of oracle classification: 0.689
Entropy of oracle classification: 0.894
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 23
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 248 weight vectors
Based on 51 matches and 23 non-matches
Classified 248 matches and 0 non-matches
68.0
Analisando o arquivo: diverg(10)802_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (10, 1 - acm diverg, 802), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)802_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 740
Number of entity ID pairs that occurred more than once: 0
Analyse set of 740 weight vectors
Containing 202 true matches and 538 true non-matches
(27.30% true matches)
Identified 690 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 656 (95.07%)
2 : 31 (4.49%)
3 : 2 (0.29%)
16 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 690 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 517
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 739
Number of unique weight vectors: 690
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(690, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 690 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 690 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 606 weight vectors
Based on 29 matches and 55 non-matches
Classified 140 matches and 466 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(466, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 140 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 140 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 47 matches and 7 non-matches
Purity of oracle classification: 0.870
Entropy of oracle classification: 0.556
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(10)992_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 992), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)992_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 814
Number of entity ID pairs that occurred more than once: 0
Analyse set of 814 weight vectors
Containing 220 true matches and 594 true non-matches
(27.03% true matches)
Identified 758 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 722 (95.25%)
2 : 33 (4.35%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 758 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 573
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 813
Number of unique weight vectors: 758
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(758, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 758 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 758 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 673 weight vectors
Based on 27 matches and 58 non-matches
Classified 146 matches and 527 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(527, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 146 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 146 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)246_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 246), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)246_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 883
Number of entity ID pairs that occurred more than once: 0
Analyse set of 883 weight vectors
Containing 177 true matches and 706 true non-matches
(20.05% true matches)
Identified 844 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 814 (96.45%)
2 : 27 (3.20%)
3 : 2 (0.24%)
9 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 844 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 158
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 685
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 874
Number of unique weight vectors: 843
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(843, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 843 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 843 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 757 weight vectors
Based on 28 matches and 58 non-matches
Classified 139 matches and 618 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(139, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(618, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 618 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 618 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 1 matches and 73 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.103
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(15)621_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 621), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)621_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 721
Number of entity ID pairs that occurred more than once: 0
Analyse set of 721 weight vectors
Containing 217 true matches and 504 true non-matches
(30.10% true matches)
Identified 666 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 630 (94.59%)
2 : 33 (4.95%)
3 : 2 (0.30%)
19 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 666 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 483
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 720
Number of unique weight vectors: 666
Time to load and analyse the weight vector file: 0.04 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(666, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 666 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 666 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 23 matches and 61 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 582 weight vectors
Based on 23 matches and 61 non-matches
Classified 0 matches and 582 non-matches
40.0
Analisando o arquivo: diverg(15)579_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (15, 1 - acm diverg, 579), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)579_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 882
Number of entity ID pairs that occurred more than once: 0
Analyse set of 882 weight vectors
Containing 187 true matches and 695 true non-matches
(21.20% true matches)
Identified 842 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 808 (95.96%)
2 : 31 (3.68%)
3 : 2 (0.24%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 842 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.000 : 675
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 882
Number of unique weight vectors: 842
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(842, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 842 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 842 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 756 weight vectors
Based on 28 matches and 58 non-matches
Classified 149 matches and 607 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(607, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 607 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 607 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 1 matches and 73 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.103
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)670_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 670), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)670_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 118 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 118 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)779_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 779), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)779_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 566 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)222_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 222), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)222_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)380_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990385
recall 0.344482
f-measure 0.511166
da 104
dm 0
ndm 0
tp 103
fp 1
tn 4.76529e+07
fn 196
Name: (10, 1 - acm diverg, 380), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)380_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 720
Number of entity ID pairs that occurred more than once: 0
Analyse set of 720 weight vectors
Containing 160 true matches and 560 true non-matches
(22.22% true matches)
Identified 699 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 688 (98.43%)
2 : 8 (1.14%)
3 : 2 (0.29%)
10 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 699 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 141
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 557
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 719
Number of unique weight vectors: 699
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(699, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 699 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 699 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 615 weight vectors
Based on 31 matches and 53 non-matches
Classified 115 matches and 500 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(115, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(500, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 115 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 115 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 41 matches and 10 non-matches
Purity of oracle classification: 0.804
Entropy of oracle classification: 0.714
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 10
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
104.0
Analisando o arquivo: diverg(20)13_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (20, 1 - acm diverg, 13), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)13_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1041
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1041 weight vectors
Containing 213 true matches and 828 true non-matches
(20.46% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 954 (96.46%)
2 : 32 (3.24%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 807
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1040
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 26 matches and 61 non-matches
Classified 109 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 109 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)442_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (15, 1 - acm diverg, 442), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)442_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 954
Number of entity ID pairs that occurred more than once: 0
Analyse set of 954 weight vectors
Containing 205 true matches and 749 true non-matches
(21.49% true matches)
Identified 903 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 869 (96.23%)
2 : 31 (3.43%)
3 : 2 (0.22%)
17 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 903 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 728
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 953
Number of unique weight vectors: 903
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(903, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 903 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 903 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 816 weight vectors
Based on 24 matches and 63 non-matches
Classified 112 matches and 704 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(704, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 112 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 46 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)592_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 592), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)592_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 794
Number of entity ID pairs that occurred more than once: 0
Analyse set of 794 weight vectors
Containing 213 true matches and 581 true non-matches
(26.83% true matches)
Identified 758 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 741 (97.76%)
2 : 14 (1.85%)
3 : 2 (0.26%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 758 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 578
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 793
Number of unique weight vectors: 758
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(758, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 758 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 758 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.429, 0.786, 0.750, 0.389, 0.857] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 673 weight vectors
Based on 30 matches and 55 non-matches
Classified 136 matches and 537 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(537, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 136 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 136 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 51 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.232
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)400_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979592
recall 0.32107
f-measure 0.483627
da 98
dm 0
ndm 0
tp 96
fp 2
tn 4.76529e+07
fn 203
Name: (10, 1 - acm diverg, 400), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)400_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 687
Number of entity ID pairs that occurred more than once: 0
Analyse set of 687 weight vectors
Containing 167 true matches and 520 true non-matches
(24.31% true matches)
Identified 650 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 619 (95.23%)
2 : 28 (4.31%)
3 : 2 (0.31%)
6 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 650 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.000 : 500
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 687
Number of unique weight vectors: 650
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(650, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 650 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 650 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 31 matches and 52 non-matches
Purity of oracle classification: 0.627
Entropy of oracle classification: 0.953
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 567 weight vectors
Based on 31 matches and 52 non-matches
Classified 260 matches and 307 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(260, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
(307, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
Current size of match and non-match training data sets: 31 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 307 weight vectors
- Estimated match proportion 0.373
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 307 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.556, 0.348, 0.467, 0.636, 0.412] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.538, 0.600, 0.471, 0.632, 0.688] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.741, 0.556, 0.667, 0.350, 0.556] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[0.800, 0.000, 0.444, 0.545, 0.333, 0.111, 0.533] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.462, 0.667, 0.636, 0.368, 0.500] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.583, 0.786, 0.842, 0.800, 0.833] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.750, 0.533, 0.294, 0.333, 0.429] (False)
[1.000, 0.000, 0.700, 0.833, 0.524, 0.636, 0.238] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 0 matches and 69 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
98.0
Analisando o arquivo: diverg(15)664_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 664), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)664_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1081
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1081 weight vectors
Containing 226 true matches and 855 true non-matches
(20.91% true matches)
Identified 1024 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 987 (96.39%)
2 : 34 (3.32%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1024 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 834
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1080
Number of unique weight vectors: 1024
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1024, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1024 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1024 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 937 weight vectors
Based on 30 matches and 57 non-matches
Classified 173 matches and 764 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(173, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(764, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 764 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 78
Farthest first selection of 78 weight vectors from 764 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 1 matches and 77 non-matches
Purity of oracle classification: 0.987
Entropy of oracle classification: 0.099
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)514_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 514), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)514_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 401
Number of entity ID pairs that occurred more than once: 0
Analyse set of 401 weight vectors
Containing 219 true matches and 182 true non-matches
(54.61% true matches)
Identified 368 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 352 (95.65%)
2 : 13 (3.53%)
3 : 2 (0.54%)
17 : 1 (0.27%)
Identified 1 non-pure unique weight vectors (from 368 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 181
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 400
Number of unique weight vectors: 368
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(368, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 368 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 368 vectors
The selected farthest weight vectors are:
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 44 matches and 32 non-matches
Purity of oracle classification: 0.579
Entropy of oracle classification: 0.982
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 32
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 292 weight vectors
Based on 44 matches and 32 non-matches
Classified 154 matches and 138 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(154, 0.5789473684210527, 0.9819407868640977, 0.5789473684210527)
(138, 0.5789473684210527, 0.9819407868640977, 0.5789473684210527)
Current size of match and non-match training data sets: 44 / 32
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 138 weight vectors
- Estimated match proportion 0.579
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 138 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.512, 1.000, 0.087, 0.190, 0.107, 0.226, 0.204] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.821, 1.000, 0.275, 0.297, 0.227, 0.255, 0.152] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.261, 0.174, 0.148, 0.186, 0.148] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.881, 1.000, 0.211, 0.250, 0.129, 0.250, 0.211] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.750, 1.000, 0.214, 0.184, 0.250, 0.267, 0.111] (False)
[0.850, 1.000, 0.179, 0.205, 0.188, 0.061, 0.180] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.750, 1.000, 0.146, 0.130, 0.176, 0.318, 0.167] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.750, 1.000, 0.243, 0.243, 0.214, 0.111, 0.132] (False)
[0.929, 1.000, 0.250, 0.193, 0.250, 0.164, 0.213] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.592, 1.000, 0.179, 0.205, 0.156, 0.273, 0.180] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.663, 1.000, 0.273, 0.244, 0.226, 0.196, 0.238] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.600, 0.944, 0.250, 0.200, 0.186, 0.136, 0.118] (False)
[0.663, 1.000, 0.132, 0.143, 0.241, 0.174, 0.167] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.902, 1.000, 0.182, 0.071, 0.182, 0.222, 0.190] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.747, 1.000, 0.231, 0.167, 0.107, 0.222, 0.125] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 2 matches and 54 non-matches
Purity of oracle classification: 0.964
Entropy of oracle classification: 0.222
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)616_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 616), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)616_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 790
Number of entity ID pairs that occurred more than once: 0
Analyse set of 790 weight vectors
Containing 208 true matches and 582 true non-matches
(26.33% true matches)
Identified 761 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 744 (97.77%)
2 : 14 (1.84%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 761 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 789
Number of unique weight vectors: 761
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(761, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 761 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 761 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 676 weight vectors
Based on 28 matches and 57 non-matches
Classified 133 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 133 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 133 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.420, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)410_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 410), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)410_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 271
Number of entity ID pairs that occurred more than once: 0
Analyse set of 271 weight vectors
Containing 163 true matches and 108 true non-matches
(60.15% true matches)
Identified 254 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 246 (96.85%)
2 : 5 (1.97%)
3 : 2 (0.79%)
9 : 1 (0.39%)
Identified 1 non-pure unique weight vectors (from 254 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 146
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 107
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 262
Number of unique weight vectors: 253
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(253, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 253 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 70
Perform initial selection using "far" method
Farthest first selection of 70 weight vectors from 253 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 34 matches and 36 non-matches
Purity of oracle classification: 0.514
Entropy of oracle classification: 0.999
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 36
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 183 weight vectors
Based on 34 matches and 36 non-matches
Classified 121 matches and 62 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 70
Size, purity, entropy, and estimated match proportion of clusters in queue:
(121, 0.5142857142857142, 0.9994110647387553, 0.4857142857142857)
(62, 0.5142857142857142, 0.9994110647387553, 0.4857142857142857)
Current size of match and non-match training data sets: 34 / 36
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 121 weight vectors
- Estimated match proportion 0.486
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 121 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 45 matches and 9 non-matches
Purity of oracle classification: 0.833
Entropy of oracle classification: 0.650
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(20)190_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 190), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)190_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)370_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 370), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)370_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 910
Number of entity ID pairs that occurred more than once: 0
Analyse set of 910 weight vectors
Containing 214 true matches and 696 true non-matches
(23.52% true matches)
Identified 855 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 819 (95.79%)
2 : 33 (3.86%)
3 : 2 (0.23%)
19 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 855 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 675
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 909
Number of unique weight vectors: 855
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(855, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 855 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 855 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 769 weight vectors
Based on 30 matches and 56 non-matches
Classified 199 matches and 570 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(199, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(570, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 570 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 570 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.808, 0.478, 0.636, 0.786, 0.500] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 0 matches and 76 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 76
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)129_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (10, 1 - acm diverg, 129), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)129_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 359
Number of entity ID pairs that occurred more than once: 0
Analyse set of 359 weight vectors
Containing 191 true matches and 168 true non-matches
(53.20% true matches)
Identified 338 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 324 (95.86%)
2 : 11 (3.25%)
3 : 2 (0.59%)
7 : 1 (0.30%)
Identified 0 non-pure unique weight vectors (from 338 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.000 : 168
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 359
Number of unique weight vectors: 338
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(338, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 338 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 338 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 43 matches and 32 non-matches
Purity of oracle classification: 0.573
Entropy of oracle classification: 0.984
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 32
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 263 weight vectors
Based on 43 matches and 32 non-matches
Classified 128 matches and 135 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 75
Size, purity, entropy, and estimated match proportion of clusters in queue:
(128, 0.5733333333333334, 0.9844268978000114, 0.5733333333333334)
(135, 0.5733333333333334, 0.9844268978000114, 0.5733333333333334)
Current size of match and non-match training data sets: 43 / 32
Selected cluster with (queue ordering: random):
- Purity 0.57 and entropy 0.98
- Size 135 weight vectors
- Estimated match proportion 0.573
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 135 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.146, 0.130, 0.176, 0.318, 0.167] (False)
[0.750, 1.000, 0.222, 0.095, 0.167, 0.139, 0.278] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.821, 1.000, 0.275, 0.297, 0.227, 0.255, 0.152] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.600, 0.944, 0.250, 0.200, 0.186, 0.136, 0.118] (False)
[0.881, 1.000, 0.211, 0.250, 0.129, 0.250, 0.211] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.750, 1.000, 0.214, 0.184, 0.250, 0.267, 0.111] (False)
[0.850, 1.000, 0.179, 0.205, 0.188, 0.061, 0.180] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.512, 1.000, 0.087, 0.190, 0.107, 0.226, 0.204] (False)
[0.750, 1.000, 0.243, 0.243, 0.214, 0.111, 0.132] (False)
[0.929, 1.000, 0.250, 0.193, 0.250, 0.164, 0.213] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.635, 1.000, 0.179, 0.265, 0.167, 0.121, 0.241] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.663, 1.000, 0.273, 0.244, 0.226, 0.196, 0.238] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.592, 1.000, 0.179, 0.205, 0.156, 0.273, 0.180] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.902, 1.000, 0.182, 0.071, 0.182, 0.222, 0.190] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.663, 1.000, 0.132, 0.143, 0.241, 0.174, 0.167] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[0.747, 1.000, 0.231, 0.167, 0.107, 0.222, 0.125] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 3 matches and 53 non-matches
Purity of oracle classification: 0.946
Entropy of oracle classification: 0.301
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)339_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 339), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)339_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 890
Number of entity ID pairs that occurred more than once: 0
Analyse set of 890 weight vectors
Containing 154 true matches and 736 true non-matches
(17.30% true matches)
Identified 854 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 826 (96.72%)
2 : 25 (2.93%)
3 : 2 (0.23%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 854 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 715
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 882
Number of unique weight vectors: 853
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(853, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 853 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 853 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 767 weight vectors
Based on 24 matches and 62 non-matches
Classified 45 matches and 722 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(45, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(722, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 45 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 29
Farthest first selection of 29 weight vectors from 45 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.929, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 1.000, 1.000, 0.952, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
Perform oracle with 100.00 accuracy on 29 weight vectors
The oracle will correctly classify 29 weight vectors and wrongly classify 0
Classified 29 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 29 weight vectors (classified by oracle) from cluster
Cluster is pure enough and not too large, add its 45 weight vectors to:
Match training set
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 3: Queue length: 1
Number of manual oracle classifications performed: 115
Size, purity, entropy, and estimated match proportion of clusters in queue:
(722, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 69 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 722 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 722 vectors
The selected farthest weight vectors are:
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 15 matches and 55 non-matches
Purity of oracle classification: 0.786
Entropy of oracle classification: 0.750
Number of true matches: 15
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)997_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 997), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)997_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 697
Number of entity ID pairs that occurred more than once: 0
Analyse set of 697 weight vectors
Containing 203 true matches and 494 true non-matches
(29.12% true matches)
Identified 646 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 612 (94.74%)
2 : 31 (4.80%)
3 : 2 (0.31%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 646 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 473
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 696
Number of unique weight vectors: 646
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(646, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 646 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 646 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 563 weight vectors
Based on 28 matches and 55 non-matches
Classified 153 matches and 410 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(410, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 410 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 410 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[1.000, 0.000, 0.700, 0.429, 0.476, 0.647, 0.810] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 2 matches and 69 non-matches
Purity of oracle classification: 0.972
Entropy of oracle classification: 0.185
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(20)73_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 73), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)73_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 118 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 118 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)330_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 330), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)330_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 586
Number of entity ID pairs that occurred more than once: 0
Analyse set of 586 weight vectors
Containing 196 true matches and 390 true non-matches
(33.45% true matches)
Identified 562 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 549 (97.69%)
2 : 10 (1.78%)
3 : 2 (0.36%)
11 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 562 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 389
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 585
Number of unique weight vectors: 562
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(562, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 562 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 562 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 27 matches and 55 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 480 weight vectors
Based on 27 matches and 55 non-matches
Classified 136 matches and 344 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
(344, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
Current size of match and non-match training data sets: 27 / 55
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 344 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 344 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.455, 0.714, 0.429, 0.550, 0.647] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 9 matches and 59 non-matches
Purity of oracle classification: 0.868
Entropy of oracle classification: 0.564
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)149_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 149), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)149_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 322
Number of entity ID pairs that occurred more than once: 0
Analyse set of 322 weight vectors
Containing 208 true matches and 114 true non-matches
(64.60% true matches)
Identified 289 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 272 (94.12%)
2 : 14 (4.84%)
3 : 2 (0.69%)
16 : 1 (0.35%)
Identified 1 non-pure unique weight vectors (from 289 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 111
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 321
Number of unique weight vectors: 289
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(289, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 289 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 72
Perform initial selection using "far" method
Farthest first selection of 72 weight vectors from 289 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 34 matches and 38 non-matches
Purity of oracle classification: 0.528
Entropy of oracle classification: 0.998
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 217 weight vectors
Based on 34 matches and 38 non-matches
Classified 151 matches and 66 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 72
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.5277777777777778, 0.9977724720899821, 0.4722222222222222)
(66, 0.5277777777777778, 0.9977724720899821, 0.4722222222222222)
Current size of match and non-match training data sets: 34 / 38
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 151 weight vectors
- Estimated match proportion 0.472
Sample size for this cluster: 59
Farthest first selection of 59 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 59 weight vectors
The oracle will correctly classify 59 weight vectors and wrongly classify 0
Classified 52 matches and 7 non-matches
Purity of oracle classification: 0.881
Entropy of oracle classification: 0.525
Number of true matches: 52
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 59 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)977_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 977), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)977_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)184_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (10, 1 - acm diverg, 184), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)184_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 745
Number of entity ID pairs that occurred more than once: 0
Analyse set of 745 weight vectors
Containing 169 true matches and 576 true non-matches
(22.68% true matches)
Identified 708 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 677 (95.62%)
2 : 28 (3.95%)
3 : 2 (0.28%)
6 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 708 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 152
0.000 : 556
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 745
Number of unique weight vectors: 708
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(708, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 708 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 708 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 26 matches and 58 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.893
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 624 weight vectors
Based on 26 matches and 58 non-matches
Classified 120 matches and 504 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(120, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
(504, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
Current size of match and non-match training data sets: 26 / 58
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 504 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 504 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 3 matches and 67 non-matches
Purity of oracle classification: 0.957
Entropy of oracle classification: 0.255
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(10)745_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 745), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)745_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 666
Number of entity ID pairs that occurred more than once: 0
Analyse set of 666 weight vectors
Containing 181 true matches and 485 true non-matches
(27.18% true matches)
Identified 645 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 635 (98.45%)
2 : 7 (1.09%)
3 : 2 (0.31%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 645 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 160
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 484
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 665
Number of unique weight vectors: 645
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(645, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 645 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 645 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 31 matches and 52 non-matches
Purity of oracle classification: 0.627
Entropy of oracle classification: 0.953
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 562 weight vectors
Based on 31 matches and 52 non-matches
Classified 295 matches and 267 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(295, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
(267, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
Current size of match and non-match training data sets: 31 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 267 weight vectors
- Estimated match proportion 0.373
Sample size for this cluster: 67
Farthest first selection of 67 weight vectors from 267 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.462, 0.667, 0.600, 0.389, 0.615] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.818, 0.762, 0.714, 0.500, 0.400] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.500, 0.739, 0.824, 0.591, 0.550] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.815, 0.643, 0.800, 0.750, 0.429] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 0 matches and 67 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(15)258_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 258), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)258_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 635
Number of entity ID pairs that occurred more than once: 0
Analyse set of 635 weight vectors
Containing 212 true matches and 423 true non-matches
(33.39% true matches)
Identified 583 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 547 (93.83%)
2 : 33 (5.66%)
3 : 2 (0.34%)
16 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 583 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 402
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 634
Number of unique weight vectors: 583
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(583, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 583 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 583 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 27 matches and 55 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 501 weight vectors
Based on 27 matches and 55 non-matches
Classified 151 matches and 350 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
(350, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
Current size of match and non-match training data sets: 27 / 55
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 350 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 350 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.714, 0.727, 0.750, 0.294, 0.833] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 7 matches and 61 non-matches
Purity of oracle classification: 0.897
Entropy of oracle classification: 0.478
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)284_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 284), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)284_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 640
Number of entity ID pairs that occurred more than once: 0
Analyse set of 640 weight vectors
Containing 177 true matches and 463 true non-matches
(27.66% true matches)
Identified 601 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 571 (95.01%)
2 : 27 (4.49%)
3 : 2 (0.33%)
9 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 601 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 158
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 442
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 631
Number of unique weight vectors: 600
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(600, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 600 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 600 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 517 weight vectors
Based on 28 matches and 55 non-matches
Classified 140 matches and 377 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(377, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 377 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 377 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.750, 0.524, 0.400, 0.813, 0.611] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.600, 0.857, 0.579, 0.286, 0.545] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.417, 0.750, 0.500, 0.455] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.857, 0.444, 0.556, 0.235, 0.500] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.714, 0.318, 0.583, 0.417, 0.727] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 1 matches and 69 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.108
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(20)101_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 101), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)101_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1073
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1073 weight vectors
Containing 226 true matches and 847 true non-matches
(21.06% true matches)
Identified 1016 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 979 (96.36%)
2 : 34 (3.35%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1016 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 826
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1072
Number of unique weight vectors: 1016
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1016, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1016 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1016 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 929 weight vectors
Based on 31 matches and 56 non-matches
Classified 332 matches and 597 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(332, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(597, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 332 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 332 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 40 matches and 30 non-matches
Purity of oracle classification: 0.571
Entropy of oracle classification: 0.985
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 30
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)238_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 238), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)238_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 830
Number of entity ID pairs that occurred more than once: 0
Analyse set of 830 weight vectors
Containing 213 true matches and 617 true non-matches
(25.66% true matches)
Identified 776 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 741 (95.49%)
2 : 32 (4.12%)
3 : 2 (0.26%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 776 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 596
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 829
Number of unique weight vectors: 776
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(776, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 776 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 776 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 691 weight vectors
Based on 28 matches and 57 non-matches
Classified 149 matches and 542 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(542, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 542 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 542 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 4 matches and 69 non-matches
Purity of oracle classification: 0.945
Entropy of oracle classification: 0.306
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)437_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976923
recall 0.424749
f-measure 0.592075
da 130
dm 0
ndm 0
tp 127
fp 3
tn 4.76529e+07
fn 172
Name: (10, 1 - acm diverg, 437), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)437_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 662
Number of entity ID pairs that occurred more than once: 0
Analyse set of 662 weight vectors
Containing 137 true matches and 525 true non-matches
(20.69% true matches)
Identified 646 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 635 (98.30%)
2 : 8 (1.24%)
3 : 2 (0.31%)
5 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 646 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 123
0.000 : 523
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 662
Number of unique weight vectors: 646
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(646, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 646 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 646 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 31 matches and 52 non-matches
Purity of oracle classification: 0.627
Entropy of oracle classification: 0.953
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 563 weight vectors
Based on 31 matches and 52 non-matches
Classified 84 matches and 479 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(84, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
(479, 0.6265060240963856, 0.9533171305598173, 0.37349397590361444)
Current size of match and non-match training data sets: 31 / 52
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 84 weight vectors
- Estimated match proportion 0.373
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 84 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 40 matches and 4 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
130.0
Analisando o arquivo: diverg(10)929_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 929), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)929_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 885
Number of entity ID pairs that occurred more than once: 0
Analyse set of 885 weight vectors
Containing 177 true matches and 708 true non-matches
(20.00% true matches)
Identified 846 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 816 (96.45%)
2 : 27 (3.19%)
3 : 2 (0.24%)
9 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 846 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 158
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 687
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 876
Number of unique weight vectors: 845
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(845, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 845 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 845 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 759 weight vectors
Based on 28 matches and 58 non-matches
Classified 140 matches and 619 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(619, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 619 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 619 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 1 matches and 73 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.103
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(15)351_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 351), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)351_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 848
Number of entity ID pairs that occurred more than once: 0
Analyse set of 848 weight vectors
Containing 214 true matches and 634 true non-matches
(25.24% true matches)
Identified 794 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 759 (95.59%)
2 : 32 (4.03%)
3 : 2 (0.25%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 794 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 847
Number of unique weight vectors: 794
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(794, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 794 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 794 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.722, 0.471, 0.545, 0.579] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.538, 0.500, 0.818, 0.789, 0.750] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.033, 0.300, 0.524, 0.727, 0.762] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 709 weight vectors
Based on 27 matches and 58 non-matches
Classified 123 matches and 586 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(586, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 586 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 586 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[0.917, 0.000, 0.550, 0.455, 0.455, 0.000, 0.000] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.500, 0.917, 0.818, 0.714, 0.611] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.833, 0.833, 0.550, 0.500, 0.688] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.783, 0.357, 0.750, 0.412, 0.238] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.636, 0.545, 0.368, 0.563, 0.462] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 17 matches and 56 non-matches
Purity of oracle classification: 0.767
Entropy of oracle classification: 0.783
Number of true matches: 17
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)276_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 276), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)276_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)490_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 490), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)490_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 401
Number of entity ID pairs that occurred more than once: 0
Analyse set of 401 weight vectors
Containing 217 true matches and 184 true non-matches
(54.11% true matches)
Identified 368 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 352 (95.65%)
2 : 13 (3.53%)
3 : 2 (0.54%)
17 : 1 (0.27%)
Identified 1 non-pure unique weight vectors (from 368 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 183
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 400
Number of unique weight vectors: 368
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(368, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 368 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 368 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 41 matches and 35 non-matches
Purity of oracle classification: 0.539
Entropy of oracle classification: 0.995
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 35
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 292 weight vectors
Based on 41 matches and 35 non-matches
Classified 148 matches and 144 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.5394736842105263, 0.9954993847275952, 0.5394736842105263)
(144, 0.5394736842105263, 0.9954993847275952, 0.5394736842105263)
Current size of match and non-match training data sets: 41 / 35
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 1.00
- Size 148 weight vectors
- Estimated match proportion 0.539
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 148 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 46 matches and 12 non-matches
Purity of oracle classification: 0.793
Entropy of oracle classification: 0.736
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 12
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)832_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 832), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)832_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 407
Number of entity ID pairs that occurred more than once: 0
Analyse set of 407 weight vectors
Containing 217 true matches and 190 true non-matches
(53.32% true matches)
Identified 370 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 352 (95.14%)
2 : 15 (4.05%)
3 : 2 (0.54%)
19 : 1 (0.27%)
Identified 1 non-pure unique weight vectors (from 370 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 187
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 406
Number of unique weight vectors: 370
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(370, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 370 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 370 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 29 matches and 47 non-matches
Purity of oracle classification: 0.618
Entropy of oracle classification: 0.959
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 47
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 294 weight vectors
Based on 29 matches and 47 non-matches
Classified 145 matches and 149 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.618421052631579, 0.959149554396894, 0.3815789473684211)
(149, 0.618421052631579, 0.959149554396894, 0.3815789473684211)
Current size of match and non-match training data sets: 29 / 47
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 149 weight vectors
- Estimated match proportion 0.382
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 9 matches and 48 non-matches
Purity of oracle classification: 0.842
Entropy of oracle classification: 0.629
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)981_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 981), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)981_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 346
Number of entity ID pairs that occurred more than once: 0
Analyse set of 346 weight vectors
Containing 212 true matches and 134 true non-matches
(61.27% true matches)
Identified 312 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 294 (94.23%)
2 : 15 (4.81%)
3 : 2 (0.64%)
16 : 1 (0.32%)
Identified 1 non-pure unique weight vectors (from 312 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 131
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 345
Number of unique weight vectors: 312
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(312, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 312 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 73
Perform initial selection using "far" method
Farthest first selection of 73 weight vectors from 312 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 35 matches and 38 non-matches
Purity of oracle classification: 0.521
Entropy of oracle classification: 0.999
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 239 weight vectors
Based on 35 matches and 38 non-matches
Classified 150 matches and 89 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 73
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.5205479452054794, 0.998781393072756, 0.4794520547945205)
(89, 0.5205479452054794, 0.998781393072756, 0.4794520547945205)
Current size of match and non-match training data sets: 35 / 38
Selected cluster with (queue ordering: random):
- Purity 0.52 and entropy 1.00
- Size 89 weight vectors
- Estimated match proportion 0.479
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 89 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 3 matches and 43 non-matches
Purity of oracle classification: 0.935
Entropy of oracle classification: 0.348
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(10)974_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979592
recall 0.32107
f-measure 0.483627
da 98
dm 0
ndm 0
tp 96
fp 2
tn 4.76529e+07
fn 203
Name: (10, 1 - acm diverg, 974), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)974_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 315
Number of entity ID pairs that occurred more than once: 0
Analyse set of 315 weight vectors
Containing 159 true matches and 156 true non-matches
(50.48% true matches)
Identified 299 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 289 (96.66%)
2 : 7 (2.34%)
3 : 2 (0.67%)
6 : 1 (0.33%)
Identified 0 non-pure unique weight vectors (from 299 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 143
0.000 : 156
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 315
Number of unique weight vectors: 299
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(299, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 299 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 73
Perform initial selection using "far" method
Farthest first selection of 73 weight vectors from 299 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 49 matches and 24 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 24
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 226 weight vectors
Based on 49 matches and 24 non-matches
Classified 226 matches and 0 non-matches
98.0
Analisando o arquivo: diverg(15)778_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 778), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)778_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.722, 0.471, 0.545, 0.579] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.538, 0.500, 0.818, 0.789, 0.750] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.033, 0.300, 0.524, 0.727, 0.762] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 27 matches and 59 non-matches
Purity of oracle classification: 0.686
Entropy of oracle classification: 0.898
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 27 matches and 59 non-matches
Classified 134 matches and 585 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(134, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
(585, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
Current size of match and non-match training data sets: 27 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 585 weight vectors
- Estimated match proportion 0.314
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 585 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.800, 0.000, 0.625, 0.571, 0.467, 0.474, 0.667] (False)
[1.000, 0.000, 0.350, 0.455, 0.625, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.333, 0.214, 0.333, 0.588, 0.476] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.625, 0.174, 0.333, 0.259, 0.286] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.857, 0.111, 0.444, 0.529, 0.500] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.583, 0.571, 0.778, 0.471, 0.500] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.875, 0.467, 0.471, 0.833, 0.571] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 15 matches and 57 non-matches
Purity of oracle classification: 0.792
Entropy of oracle classification: 0.738
Number of true matches: 15
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)829_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 829), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)829_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 799
Number of entity ID pairs that occurred more than once: 0
Analyse set of 799 weight vectors
Containing 213 true matches and 586 true non-matches
(26.66% true matches)
Identified 747 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 712 (95.31%)
2 : 32 (4.28%)
3 : 2 (0.27%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 747 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 565
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 798
Number of unique weight vectors: 747
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(747, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 747 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 747 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 662 weight vectors
Based on 27 matches and 58 non-matches
Classified 86 matches and 576 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(86, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(576, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 576 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 576 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 20 matches and 53 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 20
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)91_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 91), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)91_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 351
Number of entity ID pairs that occurred more than once: 0
Analyse set of 351 weight vectors
Containing 172 true matches and 179 true non-matches
(49.00% true matches)
Identified 330 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 318 (96.36%)
2 : 9 (2.73%)
3 : 2 (0.61%)
9 : 1 (0.30%)
Identified 1 non-pure unique weight vectors (from 330 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 153
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 176
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 342
Number of unique weight vectors: 329
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(329, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 329 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 74
Perform initial selection using "far" method
Farthest first selection of 74 weight vectors from 329 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 31 matches and 43 non-matches
Purity of oracle classification: 0.581
Entropy of oracle classification: 0.981
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 255 weight vectors
Based on 31 matches and 43 non-matches
Classified 125 matches and 130 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 74
Size, purity, entropy, and estimated match proportion of clusters in queue:
(125, 0.581081081081081, 0.9809470132751208, 0.4189189189189189)
(130, 0.581081081081081, 0.9809470132751208, 0.4189189189189189)
Current size of match and non-match training data sets: 31 / 43
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 130 weight vectors
- Estimated match proportion 0.419
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 130 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[0.367, 1.000, 0.154, 0.174, 0.125, 0.240, 0.226] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.800, 0.636, 0.563, 0.545, 0.722] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 3 matches and 52 non-matches
Purity of oracle classification: 0.945
Entropy of oracle classification: 0.305
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(15)617_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 617), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)617_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 952
Number of entity ID pairs that occurred more than once: 0
Analyse set of 952 weight vectors
Containing 201 true matches and 751 true non-matches
(21.11% true matches)
Identified 907 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 873 (96.25%)
2 : 31 (3.42%)
3 : 2 (0.22%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 907 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 730
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 951
Number of unique weight vectors: 907
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(907, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 907 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 907 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 820 weight vectors
Based on 24 matches and 63 non-matches
Classified 115 matches and 705 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(115, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(705, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 115 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 115 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 46 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)553_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 553), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)553_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1059
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1059 weight vectors
Containing 219 true matches and 840 true non-matches
(20.68% true matches)
Identified 1003 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 967 (96.41%)
2 : 33 (3.29%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1003 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 819
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1058
Number of unique weight vectors: 1003
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1003, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1003 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1003 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 916 weight vectors
Based on 31 matches and 56 non-matches
Classified 319 matches and 597 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(319, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(597, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 597 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 597 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.923, 0.667, 0.667, 0.412, 0.571] (False)
[0.667, 0.000, 0.667, 0.500, 0.647, 0.556, 0.684] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.750, 0.429, 0.526, 0.500, 0.846] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.750, 0.524, 0.400, 0.813, 0.611] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 0.583, 0.444, 0.412, 0.318, 0.421] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.233, 0.545, 0.714, 0.455, 0.238] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.462, 0.889, 0.455, 0.211, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)861_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (15, 1 - acm diverg, 861), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)861_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 901
Number of entity ID pairs that occurred more than once: 0
Analyse set of 901 weight vectors
Containing 208 true matches and 693 true non-matches
(23.09% true matches)
Identified 848 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 813 (95.87%)
2 : 32 (3.77%)
3 : 2 (0.24%)
18 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 848 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 672
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 900
Number of unique weight vectors: 848
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(848, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 848 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 848 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 762 weight vectors
Based on 30 matches and 56 non-matches
Classified 192 matches and 570 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(192, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(570, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 570 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 570 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.667, 0.444, 0.556, 0.222, 0.143] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 0 matches and 76 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 76
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(15)188_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 188), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)188_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 901
Number of entity ID pairs that occurred more than once: 0
Analyse set of 901 weight vectors
Containing 213 true matches and 688 true non-matches
(23.64% true matches)
Identified 849 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 813 (95.76%)
2 : 33 (3.89%)
3 : 2 (0.24%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 849 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 667
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 900
Number of unique weight vectors: 849
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(849, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 849 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 849 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 763 weight vectors
Based on 29 matches and 57 non-matches
Classified 180 matches and 583 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(180, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(583, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 583 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 583 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.346, 0.522, 0.765, 0.769, 0.455] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.538, 0.789, 0.353, 0.545, 0.550] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.857, 0.417, 0.750, 0.500, 0.455] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 0 matches and 75 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 75
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)175_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 175), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)175_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 789
Number of entity ID pairs that occurred more than once: 0
Analyse set of 789 weight vectors
Containing 225 true matches and 564 true non-matches
(28.52% true matches)
Identified 750 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 731 (97.47%)
2 : 16 (2.13%)
3 : 2 (0.27%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 750 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 561
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 788
Number of unique weight vectors: 750
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(750, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 750 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 750 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 31 matches and 54 non-matches
Purity of oracle classification: 0.635
Entropy of oracle classification: 0.947
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 665 weight vectors
Based on 31 matches and 54 non-matches
Classified 150 matches and 515 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6352941176470588, 0.9465202215633438, 0.36470588235294116)
(515, 0.6352941176470588, 0.9465202215633438, 0.36470588235294116)
Current size of match and non-match training data sets: 31 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.95
- Size 515 weight vectors
- Estimated match proportion 0.365
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 515 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.433, 0.667, 0.500, 0.636, 0.421] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.667, 0.000, 0.500, 0.600, 0.353, 0.611, 0.526] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 6 matches and 70 non-matches
Purity of oracle classification: 0.921
Entropy of oracle classification: 0.398
Number of true matches: 6
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)343_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (20, 1 - acm diverg, 343), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)343_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1026
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1026 weight vectors
Containing 198 true matches and 828 true non-matches
(19.30% true matches)
Identified 984 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 949 (96.44%)
2 : 32 (3.25%)
3 : 2 (0.20%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 984 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 808
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 984
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(984, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 984 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 984 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 897 weight vectors
Based on 26 matches and 61 non-matches
Classified 93 matches and 804 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(93, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(804, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 93 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 93 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)17_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 17), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)17_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 732
Number of entity ID pairs that occurred more than once: 0
Analyse set of 732 weight vectors
Containing 219 true matches and 513 true non-matches
(29.92% true matches)
Identified 677 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 641 (94.68%)
2 : 33 (4.87%)
3 : 2 (0.30%)
19 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 677 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 492
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 731
Number of unique weight vectors: 677
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(677, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 677 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 677 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 593 weight vectors
Based on 27 matches and 57 non-matches
Classified 148 matches and 445 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(445, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 445 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 445 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 8 matches and 62 non-matches
Purity of oracle classification: 0.886
Entropy of oracle classification: 0.513
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)757_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 757), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)757_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 817 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 817 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 11 matches and 60 non-matches
Purity of oracle classification: 0.845
Entropy of oracle classification: 0.622
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)319_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 319), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)319_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1086
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1086 weight vectors
Containing 214 true matches and 872 true non-matches
(19.71% true matches)
Identified 1032 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 997 (96.61%)
2 : 32 (3.10%)
3 : 2 (0.19%)
19 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1032 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 851
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1085
Number of unique weight vectors: 1032
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1032, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1032 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1032 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 944 weight vectors
Based on 23 matches and 65 non-matches
Classified 98 matches and 846 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(98, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(846, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 98 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 42
Farthest first selection of 42 weight vectors from 98 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 42 weight vectors
The oracle will correctly classify 42 weight vectors and wrongly classify 0
Classified 42 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 42 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)251_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 251), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)251_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 266
Number of entity ID pairs that occurred more than once: 0
Analyse set of 266 weight vectors
Containing 209 true matches and 57 true non-matches
(78.57% true matches)
Identified 235 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 220 (93.62%)
2 : 12 (5.11%)
3 : 2 (0.85%)
16 : 1 (0.43%)
Identified 1 non-pure unique weight vectors (from 235 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 56
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 265
Number of unique weight vectors: 235
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(235, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 235 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 68
Perform initial selection using "far" method
Farthest first selection of 68 weight vectors from 235 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 38 matches and 30 non-matches
Purity of oracle classification: 0.559
Entropy of oracle classification: 0.990
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 30
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 167 weight vectors
Based on 38 matches and 30 non-matches
Classified 160 matches and 7 non-matches
Non-match cluster not large enough for required sample size
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 1
Number of manual oracle classifications performed: 68
Size, purity, entropy, and estimated match proportion of clusters in queue:
(160, 0.5588235294117647, 0.9899927915575188, 0.5588235294117647)
Current size of match and non-match training data sets: 38 / 30
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 160 weight vectors
- Estimated match proportion 0.559
Sample size for this cluster: 60
Farthest first selection of 60 weight vectors from 160 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 60 weight vectors
The oracle will correctly classify 60 weight vectors and wrongly classify 0
Classified 44 matches and 16 non-matches
Purity of oracle classification: 0.733
Entropy of oracle classification: 0.837
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 16
Number of false non-matches: 0
Deleted 60 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(15)239_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (15, 1 - acm diverg, 239), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)239_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 997
Number of entity ID pairs that occurred more than once: 0
Analyse set of 997 weight vectors
Containing 202 true matches and 795 true non-matches
(20.26% true matches)
Identified 947 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 913 (96.41%)
2 : 31 (3.27%)
3 : 2 (0.21%)
16 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 947 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 774
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 996
Number of unique weight vectors: 947
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(947, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 947 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 947 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 860 weight vectors
Based on 26 matches and 61 non-matches
Classified 131 matches and 729 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(729, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 729 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 729 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 9 matches and 63 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(20)33_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 33), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)33_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 226 true matches and 857 true non-matches
(20.87% true matches)
Identified 1026 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 989 (96.39%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1026 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1026
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1026, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1026 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1026 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 29 matches and 59 non-matches
Purity of oracle classification: 0.670
Entropy of oracle classification: 0.914
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 938 weight vectors
Based on 29 matches and 59 non-matches
Classified 159 matches and 779 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
(779, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
Current size of match and non-match training data sets: 29 / 59
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 159 weight vectors
- Estimated match proportion 0.330
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 159 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 47 matches and 8 non-matches
Purity of oracle classification: 0.855
Entropy of oracle classification: 0.598
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)668_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 668), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)668_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 29 matches and 59 non-matches
Purity of oracle classification: 0.670
Entropy of oracle classification: 0.914
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 29 matches and 59 non-matches
Classified 162 matches and 777 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
(777, 0.6704545454545454, 0.9144612916935675, 0.32954545454545453)
Current size of match and non-match training data sets: 29 / 59
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 162 weight vectors
- Estimated match proportion 0.330
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 162 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)634_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (15, 1 - acm diverg, 634), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)634_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 649
Number of entity ID pairs that occurred more than once: 0
Analyse set of 649 weight vectors
Containing 198 true matches and 451 true non-matches
(30.51% true matches)
Identified 620 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 607 (97.90%)
2 : 10 (1.61%)
3 : 2 (0.32%)
16 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 620 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 450
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 648
Number of unique weight vectors: 620
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(620, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 620 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 620 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 33 matches and 50 non-matches
Purity of oracle classification: 0.602
Entropy of oracle classification: 0.970
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 537 weight vectors
Based on 33 matches and 50 non-matches
Classified 155 matches and 382 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6024096385542169, 0.9695235828220428, 0.39759036144578314)
(382, 0.6024096385542169, 0.9695235828220428, 0.39759036144578314)
Current size of match and non-match training data sets: 33 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 382 weight vectors
- Estimated match proportion 0.398
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 382 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.409, 0.654, 0.500, 0.516, 0.333] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.296, 0.600, 0.471, 0.600, 0.643] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.817, 1.000, 0.250, 0.212, 0.256, 0.045, 0.250] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.522, 0.786, 0.800, 0.824, 0.667] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.632, 0.750, 0.696, 0.682, 0.450] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.500, 0.452, 0.632, 0.714, 0.667] (False)
[0.790, 0.000, 0.636, 0.619, 0.429, 0.450, 0.609] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.786, 0.833, 0.545, 0.478, 0.346] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 0.000, 0.667, 0.722, 0.353, 0.545, 0.800] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 0 matches and 74 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(15)609_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 609), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)609_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 754
Number of entity ID pairs that occurred more than once: 0
Analyse set of 754 weight vectors
Containing 222 true matches and 532 true non-matches
(29.44% true matches)
Identified 718 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 699 (97.35%)
2 : 16 (2.23%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 718 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 529
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 753
Number of unique weight vectors: 718
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(718, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 718 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 718 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 30 matches and 54 non-matches
Purity of oracle classification: 0.643
Entropy of oracle classification: 0.940
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 634 weight vectors
Based on 30 matches and 54 non-matches
Classified 146 matches and 488 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
(488, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
Current size of match and non-match training data sets: 30 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 146 weight vectors
- Estimated match proportion 0.357
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 146 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 53 matches and 2 non-matches
Purity of oracle classification: 0.964
Entropy of oracle classification: 0.225
Number of true matches: 53
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)265_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 265), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)265_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)709_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 709), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)709_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 857
Number of entity ID pairs that occurred more than once: 0
Analyse set of 857 weight vectors
Containing 187 true matches and 670 true non-matches
(21.82% true matches)
Identified 817 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 783 (95.84%)
2 : 31 (3.79%)
3 : 2 (0.24%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 817 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.000 : 650
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 857
Number of unique weight vectors: 817
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(817, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 817 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 817 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 731 weight vectors
Based on 28 matches and 58 non-matches
Classified 150 matches and 581 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(581, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 581 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 581 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.667, 0.273, 0.583, 0.444, 0.727] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 0 matches and 74 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)777_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 777), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)777_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1082
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1082 weight vectors
Containing 209 true matches and 873 true non-matches
(19.32% true matches)
Identified 1035 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1000 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1035 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1081
Number of unique weight vectors: 1035
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1035, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1035 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1035 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 947 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 846 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(846, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 846 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 846 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)75_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 75), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)75_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 768
Number of entity ID pairs that occurred more than once: 0
Analyse set of 768 weight vectors
Containing 222 true matches and 546 true non-matches
(28.91% true matches)
Identified 714 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 677 (94.82%)
2 : 34 (4.76%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 714 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 525
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 767
Number of unique weight vectors: 714
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(714, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 714 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 714 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 630 weight vectors
Based on 27 matches and 57 non-matches
Classified 149 matches and 481 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(481, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 481 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 481 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.714, 0.727, 0.750, 0.294, 0.833] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 9 matches and 62 non-matches
Purity of oracle classification: 0.873
Entropy of oracle classification: 0.548
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)691_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 691), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)691_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)406_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 406), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)406_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1044
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1044 weight vectors
Containing 225 true matches and 819 true non-matches
(21.55% true matches)
Identified 987 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 950 (96.25%)
2 : 34 (3.44%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 987 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 798
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1043
Number of unique weight vectors: 987
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(987, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 987 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 987 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 900 weight vectors
Based on 32 matches and 55 non-matches
Classified 329 matches and 571 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(329, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(571, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 571 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 571 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)421_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (15, 1 - acm diverg, 421), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)421_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 898
Number of entity ID pairs that occurred more than once: 0
Analyse set of 898 weight vectors
Containing 155 true matches and 743 true non-matches
(17.26% true matches)
Identified 862 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 834 (96.75%)
2 : 25 (2.90%)
3 : 2 (0.23%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 862 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 139
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 722
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 890
Number of unique weight vectors: 861
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(861, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 861 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 861 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 25 matches and 61 non-matches
Purity of oracle classification: 0.709
Entropy of oracle classification: 0.870
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 775 weight vectors
Based on 25 matches and 61 non-matches
Classified 87 matches and 688 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(87, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
(688, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
Current size of match and non-match training data sets: 25 / 61
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 87 weight vectors
- Estimated match proportion 0.291
Sample size for this cluster: 42
Farthest first selection of 42 weight vectors from 87 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.952, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 42 weight vectors
The oracle will correctly classify 42 weight vectors and wrongly classify 0
Classified 41 matches and 1 non-matches
Purity of oracle classification: 0.976
Entropy of oracle classification: 0.162
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 42 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(20)395_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 395), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)395_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 226 true matches and 857 true non-matches
(20.87% true matches)
Identified 1026 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 989 (96.39%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1026 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1026
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1026, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1026 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1026 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 938 weight vectors
Based on 30 matches and 58 non-matches
Classified 177 matches and 761 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(177, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(761, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 177 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 177 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.929, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 44 matches and 14 non-matches
Purity of oracle classification: 0.759
Entropy of oracle classification: 0.797
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 14
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)208_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 208), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)208_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 714
Number of entity ID pairs that occurred more than once: 0
Analyse set of 714 weight vectors
Containing 201 true matches and 513 true non-matches
(28.15% true matches)
Identified 669 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 635 (94.92%)
2 : 31 (4.63%)
3 : 2 (0.30%)
11 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 669 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 492
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 713
Number of unique weight vectors: 669
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(669, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 669 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 669 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 26 matches and 58 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.893
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 585 weight vectors
Based on 26 matches and 58 non-matches
Classified 123 matches and 462 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
(462, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
Current size of match and non-match training data sets: 26 / 58
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 123 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)29_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (15, 1 - acm diverg, 29), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)29_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 297
Number of entity ID pairs that occurred more than once: 0
Analyse set of 297 weight vectors
Containing 183 true matches and 114 true non-matches
(61.62% true matches)
Identified 276 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 261 (94.57%)
2 : 12 (4.35%)
3 : 2 (0.72%)
6 : 1 (0.36%)
Identified 0 non-pure unique weight vectors (from 276 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 164
0.000 : 112
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 297
Number of unique weight vectors: 276
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(276, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 276 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 71
Perform initial selection using "far" method
Farthest first selection of 71 weight vectors from 276 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 36 matches and 35 non-matches
Purity of oracle classification: 0.507
Entropy of oracle classification: 1.000
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 35
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 205 weight vectors
Based on 36 matches and 35 non-matches
Classified 140 matches and 65 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 71
Size, purity, entropy, and estimated match proportion of clusters in queue:
(140, 0.5070422535211268, 0.9998568991526107, 0.5070422535211268)
(65, 0.5070422535211268, 0.9998568991526107, 0.5070422535211268)
Current size of match and non-match training data sets: 36 / 35
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 65 weight vectors
- Estimated match proportion 0.507
Sample size for this cluster: 39
Farthest first selection of 39 weight vectors from 65 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
Perform oracle with 100.00 accuracy on 39 weight vectors
The oracle will correctly classify 39 weight vectors and wrongly classify 0
Classified 0 matches and 39 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 39 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)367_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 367), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)367_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 810
Number of entity ID pairs that occurred more than once: 0
Analyse set of 810 weight vectors
Containing 223 true matches and 587 true non-matches
(27.53% true matches)
Identified 756 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 719 (95.11%)
2 : 34 (4.50%)
3 : 2 (0.26%)
17 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 756 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 566
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 809
Number of unique weight vectors: 756
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(756, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 756 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 756 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 671 weight vectors
Based on 27 matches and 58 non-matches
Classified 94 matches and 577 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(577, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 577 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 577 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 20 matches and 53 non-matches
Purity of oracle classification: 0.726
Entropy of oracle classification: 0.847
Number of true matches: 20
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)844_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 844), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)844_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1035
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1035 weight vectors
Containing 223 true matches and 812 true non-matches
(21.55% true matches)
Identified 981 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 944 (96.23%)
2 : 34 (3.47%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 981 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 791
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1034
Number of unique weight vectors: 981
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(981, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 981 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 981 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 894 weight vectors
Based on 30 matches and 57 non-matches
Classified 160 matches and 734 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(160, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(734, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 734 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 734 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 3 matches and 74 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.238
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)351_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 351), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)351_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 689
Number of entity ID pairs that occurred more than once: 0
Analyse set of 689 weight vectors
Containing 219 true matches and 470 true non-matches
(31.79% true matches)
Identified 656 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 640 (97.56%)
2 : 13 (1.98%)
3 : 2 (0.30%)
17 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 656 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 186
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 469
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 688
Number of unique weight vectors: 656
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(656, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 656 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 656 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 572 weight vectors
Based on 29 matches and 55 non-matches
Classified 145 matches and 427 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(427, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 145 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 145 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)761_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 761), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)761_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)778_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 778), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)778_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)592_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 592), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)592_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 831
Number of entity ID pairs that occurred more than once: 0
Analyse set of 831 weight vectors
Containing 227 true matches and 604 true non-matches
(27.32% true matches)
Identified 774 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 737 (95.22%)
2 : 34 (4.39%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 774 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 583
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 830
Number of unique weight vectors: 774
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(774, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 774 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 774 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 689 weight vectors
Based on 28 matches and 57 non-matches
Classified 151 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(538, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 538 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 538 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 9 matches and 64 non-matches
Purity of oracle classification: 0.877
Entropy of oracle classification: 0.539
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)642_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 642), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)642_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(10)567_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 567), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)567_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 583
Number of entity ID pairs that occurred more than once: 0
Analyse set of 583 weight vectors
Containing 187 true matches and 396 true non-matches
(32.08% true matches)
Identified 561 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 545 (97.15%)
2 : 13 (2.32%)
3 : 2 (0.36%)
6 : 1 (0.18%)
Identified 0 non-pure unique weight vectors (from 561 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.000 : 394
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 583
Number of unique weight vectors: 561
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(561, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 561 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 561 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.750, 0.905, 0.667, 0.500, 0.571] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 29 matches and 53 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.937
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 479 weight vectors
Based on 29 matches and 53 non-matches
Classified 133 matches and 346 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
(346, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
Current size of match and non-match training data sets: 29 / 53
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 346 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 346 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.846, 0.542, 0.588, 0.579, 0.423] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.500, 0.667, 0.571, 0.500, 0.625] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.833, 0.727, 0.818, 0.750, 0.722] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.367, 0.667, 0.583, 0.625, 0.316] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 7 matches and 63 non-matches
Purity of oracle classification: 0.900
Entropy of oracle classification: 0.469
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(15)199_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (15, 1 - acm diverg, 199), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)199_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 647
Number of entity ID pairs that occurred more than once: 0
Analyse set of 647 weight vectors
Containing 215 true matches and 432 true non-matches
(33.23% true matches)
Identified 595 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 559 (93.95%)
2 : 33 (5.55%)
3 : 2 (0.34%)
16 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 595 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 411
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 646
Number of unique weight vectors: 595
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(595, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 595 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 595 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 28 matches and 54 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 513 weight vectors
Based on 28 matches and 54 non-matches
Classified 146 matches and 367 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
(367, 0.6585365853658537, 0.9262122127346665, 0.34146341463414637)
Current size of match and non-match training data sets: 28 / 54
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 367 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 367 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.714, 0.727, 0.750, 0.294, 0.833] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.348, 0.429, 0.526, 0.529, 0.619] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 10 matches and 60 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 10
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(15)582_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 582), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)582_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 452
Number of entity ID pairs that occurred more than once: 0
Analyse set of 452 weight vectors
Containing 221 true matches and 231 true non-matches
(48.89% true matches)
Identified 416 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 400 (96.15%)
2 : 13 (3.12%)
3 : 2 (0.48%)
20 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 416 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 230
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 451
Number of unique weight vectors: 416
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(416, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 416 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 416 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 35 matches and 43 non-matches
Purity of oracle classification: 0.551
Entropy of oracle classification: 0.992
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 338 weight vectors
Based on 35 matches and 43 non-matches
Classified 146 matches and 192 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.5512820512820513, 0.9923985003332222, 0.44871794871794873)
(192, 0.5512820512820513, 0.9923985003332222, 0.44871794871794873)
Current size of match and non-match training data sets: 35 / 43
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 192 weight vectors
- Estimated match proportion 0.449
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 192 vectors
The selected farthest weight vectors are:
[0.512, 1.000, 0.087, 0.190, 0.107, 0.226, 0.204] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.929, 1.000, 0.182, 0.238, 0.188, 0.146, 0.270] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 8 matches and 56 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)103_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 103), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)103_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 417
Number of entity ID pairs that occurred more than once: 0
Analyse set of 417 weight vectors
Containing 200 true matches and 217 true non-matches
(47.96% true matches)
Identified 391 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 377 (96.42%)
2 : 11 (2.81%)
3 : 2 (0.51%)
12 : 1 (0.26%)
Identified 1 non-pure unique weight vectors (from 391 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 216
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 416
Number of unique weight vectors: 391
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(391, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 391 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 391 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 39 matches and 38 non-matches
Purity of oracle classification: 0.506
Entropy of oracle classification: 1.000
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 314 weight vectors
Based on 39 matches and 38 non-matches
Classified 133 matches and 181 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.5064935064935064, 0.9998783322990061, 0.5064935064935064)
(181, 0.5064935064935064, 0.9998783322990061, 0.5064935064935064)
Current size of match and non-match training data sets: 39 / 38
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 181 weight vectors
- Estimated match proportion 0.506
Sample size for this cluster: 63
Farthest first selection of 63 weight vectors from 181 vectors
The selected farthest weight vectors are:
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.717, 1.000, 0.240, 0.231, 0.065, 0.192, 0.184] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.740, 1.000, 0.261, 0.273, 0.186, 0.171, 0.095] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.663, 1.000, 0.273, 0.244, 0.226, 0.196, 0.238] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.488, 1.000, 0.231, 0.167, 0.105, 0.122, 0.000] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.850, 1.000, 0.179, 0.205, 0.188, 0.061, 0.180] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 63 weight vectors
The oracle will correctly classify 63 weight vectors and wrongly classify 0
Classified 8 matches and 55 non-matches
Purity of oracle classification: 0.873
Entropy of oracle classification: 0.549
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 63 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)193_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 193), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)193_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.05 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)474_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 474), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)474_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)949_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 949), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)949_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 565
Number of entity ID pairs that occurred more than once: 0
Analyse set of 565 weight vectors
Containing 147 true matches and 418 true non-matches
(26.02% true matches)
Identified 548 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 539 (98.36%)
2 : 6 (1.09%)
3 : 2 (0.36%)
8 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 548 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 132
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 415
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 557
Number of unique weight vectors: 547
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(547, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 547 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 547 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 31 matches and 50 non-matches
Purity of oracle classification: 0.617
Entropy of oracle classification: 0.960
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 466 weight vectors
Based on 31 matches and 50 non-matches
Classified 107 matches and 359 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(107, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
(359, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
Current size of match and non-match training data sets: 31 / 50
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 107 weight vectors
- Estimated match proportion 0.383
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 107 vectors
The selected farthest weight vectors are:
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 42 matches and 7 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(20)458_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 458), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)458_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(15)995_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 995), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)995_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 861
Number of entity ID pairs that occurred more than once: 0
Analyse set of 861 weight vectors
Containing 227 true matches and 634 true non-matches
(26.36% true matches)
Identified 804 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 767 (95.40%)
2 : 34 (4.23%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 804 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 860
Number of unique weight vectors: 804
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(804, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 804 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 804 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 718 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 565 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(565, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 565 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 565 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)243_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (15, 1 - acm diverg, 243), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)243_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 689
Number of entity ID pairs that occurred more than once: 0
Analyse set of 689 weight vectors
Containing 167 true matches and 522 true non-matches
(24.24% true matches)
Identified 670 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 657 (98.06%)
2 : 10 (1.49%)
3 : 2 (0.30%)
6 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 670 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.000 : 520
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 689
Number of unique weight vectors: 670
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(670, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 670 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 670 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 586 weight vectors
Based on 29 matches and 55 non-matches
Classified 113 matches and 473 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(113, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(473, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 473 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 473 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.870, 0.619, 0.643, 0.700, 0.524] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 5 matches and 68 non-matches
Purity of oracle classification: 0.932
Entropy of oracle classification: 0.360
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 68
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(20)519_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 519), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)519_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 845
Number of entity ID pairs that occurred more than once: 0
Analyse set of 845 weight vectors
Containing 227 true matches and 618 true non-matches
(26.86% true matches)
Identified 788 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 751 (95.30%)
2 : 34 (4.31%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 788 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 597
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 844
Number of unique weight vectors: 788
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(788, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 788 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 788 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 703 weight vectors
Based on 29 matches and 56 non-matches
Classified 162 matches and 541 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(162, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(541, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 162 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 162 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)823_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 823), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)823_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 28 matches and 57 non-matches
Classified 141 matches and 543 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(543, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 543 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 543 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 12 matches and 61 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.645
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)852_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 852), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)852_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 793 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 793 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.736, 1.000, 0.250, 0.290, 0.172, 0.188, 0.286] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 14 matches and 59 non-matches
Purity of oracle classification: 0.808
Entropy of oracle classification: 0.705
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)437_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 437), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)437_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 700
Number of entity ID pairs that occurred more than once: 0
Analyse set of 700 weight vectors
Containing 214 true matches and 486 true non-matches
(30.57% true matches)
Identified 665 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 650 (97.74%)
2 : 12 (1.80%)
3 : 2 (0.30%)
20 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 665 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 485
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 699
Number of unique weight vectors: 665
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(665, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 665 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 665 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 581 weight vectors
Based on 31 matches and 53 non-matches
Classified 314 matches and 267 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(314, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(267, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 314 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 314 vectors
The selected farthest weight vectors are:
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.890, 1.000, 0.281, 0.136, 0.183, 0.250, 0.163] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 42 matches and 28 non-matches
Purity of oracle classification: 0.600
Entropy of oracle classification: 0.971
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)14_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 14), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)14_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 950 non-matches
46.0
Analisando o arquivo: diverg(10)303_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 303), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)303_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 691
Number of entity ID pairs that occurred more than once: 0
Analyse set of 691 weight vectors
Containing 191 true matches and 500 true non-matches
(27.64% true matches)
Identified 667 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 650 (97.45%)
2 : 14 (2.10%)
3 : 2 (0.30%)
7 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 667 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.000 : 498
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 691
Number of unique weight vectors: 667
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(667, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 667 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 667 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 34 matches and 50 non-matches
Purity of oracle classification: 0.595
Entropy of oracle classification: 0.974
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 583 weight vectors
Based on 34 matches and 50 non-matches
Classified 274 matches and 309 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(274, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
(309, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
Current size of match and non-match training data sets: 34 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 309 weight vectors
- Estimated match proportion 0.405
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 309 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.455, 0.714, 0.429, 0.550, 0.647] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[1.000, 0.000, 0.864, 0.667, 0.435, 0.700, 0.600] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 0.750, 0.905, 0.667, 0.500, 0.571] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.846, 0.737, 0.706, 0.583, 0.800] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 0 matches and 71 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(15)475_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 475), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)475_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 943
Number of entity ID pairs that occurred more than once: 0
Analyse set of 943 weight vectors
Containing 199 true matches and 744 true non-matches
(21.10% true matches)
Identified 898 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 864 (96.21%)
2 : 31 (3.45%)
3 : 2 (0.22%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 898 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 723
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 942
Number of unique weight vectors: 898
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(898, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 898 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 898 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 25 matches and 61 non-matches
Purity of oracle classification: 0.709
Entropy of oracle classification: 0.870
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 812 weight vectors
Based on 25 matches and 61 non-matches
Classified 123 matches and 689 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
(689, 0.7093023255813954, 0.8696207740543749, 0.29069767441860467)
Current size of match and non-match training data sets: 25 / 61
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 123 weight vectors
- Estimated match proportion 0.291
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)977_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 977), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)977_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 375
Number of entity ID pairs that occurred more than once: 0
Analyse set of 375 weight vectors
Containing 195 true matches and 180 true non-matches
(52.00% true matches)
Identified 348 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 332 (95.40%)
2 : 13 (3.74%)
3 : 2 (0.57%)
11 : 1 (0.29%)
Identified 1 non-pure unique weight vectors (from 348 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 177
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 374
Number of unique weight vectors: 348
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(348, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 348 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 348 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 31 matches and 44 non-matches
Purity of oracle classification: 0.587
Entropy of oracle classification: 0.978
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 273 weight vectors
Based on 31 matches and 44 non-matches
Classified 144 matches and 129 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 75
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.5866666666666667, 0.9782176659354248, 0.41333333333333333)
(129, 0.5866666666666667, 0.9782176659354248, 0.41333333333333333)
Current size of match and non-match training data sets: 31 / 44
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 144 weight vectors
- Estimated match proportion 0.413
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 144 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 50 matches and 7 non-matches
Purity of oracle classification: 0.877
Entropy of oracle classification: 0.537
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)134_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 134), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)134_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1001
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1001 weight vectors
Containing 198 true matches and 803 true non-matches
(19.78% true matches)
Identified 959 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 924 (96.35%)
2 : 32 (3.34%)
3 : 2 (0.21%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 959 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 783
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1001
Number of unique weight vectors: 959
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(959, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 959 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 959 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 872 weight vectors
Based on 26 matches and 61 non-matches
Classified 106 matches and 766 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(106, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(766, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 106 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 106 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[0.511, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 46 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(10)372_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (10, 1 - acm diverg, 372), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)372_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 945
Number of entity ID pairs that occurred more than once: 0
Analyse set of 945 weight vectors
Containing 153 true matches and 792 true non-matches
(16.19% true matches)
Identified 908 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 877 (96.59%)
2 : 28 (3.08%)
3 : 2 (0.22%)
6 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 908 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 136
0.000 : 772
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 945
Number of unique weight vectors: 908
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(908, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 908 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 908 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 821 weight vectors
Based on 30 matches and 57 non-matches
Classified 248 matches and 573 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(248, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(573, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 248 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 248 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.512, 1.000, 0.087, 0.190, 0.107, 0.226, 0.204] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.663, 1.000, 0.132, 0.143, 0.241, 0.174, 0.167] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 42 matches and 22 non-matches
Purity of oracle classification: 0.656
Entropy of oracle classification: 0.928
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 22
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(10)887_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 887), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)887_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 583
Number of entity ID pairs that occurred more than once: 0
Analyse set of 583 weight vectors
Containing 208 true matches and 375 true non-matches
(35.68% true matches)
Identified 550 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 533 (96.91%)
2 : 14 (2.55%)
3 : 2 (0.36%)
16 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 550 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 372
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 582
Number of unique weight vectors: 550
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(550, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 550 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 550 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 29 matches and 53 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.937
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 468 weight vectors
Based on 29 matches and 53 non-matches
Classified 151 matches and 317 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
(317, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
Current size of match and non-match training data sets: 29 / 53
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 151 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 151 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 51 matches and 5 non-matches
Purity of oracle classification: 0.911
Entropy of oracle classification: 0.434
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)312_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (20, 1 - acm diverg, 312), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)312_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1061
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1061 weight vectors
Containing 188 true matches and 873 true non-matches
(17.72% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 988 (96.96%)
2 : 28 (2.75%)
3 : 2 (0.20%)
11 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1060
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 75 matches and 857 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(75, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(857, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 75 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 38
Farthest first selection of 38 weight vectors from 75 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
Perform oracle with 100.00 accuracy on 38 weight vectors
The oracle will correctly classify 38 weight vectors and wrongly classify 0
Classified 38 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 38
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 38 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(10)24_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 24), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)24_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 866
Number of entity ID pairs that occurred more than once: 0
Analyse set of 866 weight vectors
Containing 154 true matches and 712 true non-matches
(17.78% true matches)
Identified 830 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 802 (96.63%)
2 : 25 (3.01%)
3 : 2 (0.24%)
8 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 830 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 138
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 691
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 858
Number of unique weight vectors: 829
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(829, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 829 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 829 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 743 weight vectors
Based on 24 matches and 62 non-matches
Classified 60 matches and 683 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(60, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(683, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 60 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 34
Farthest first selection of 34 weight vectors from 60 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 1.000, 0.952, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.420, 1.000, 1.000, 1.000, 1.000, 1.000, 0.947] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
Perform oracle with 100.00 accuracy on 34 weight vectors
The oracle will correctly classify 34 weight vectors and wrongly classify 0
Classified 33 matches and 1 non-matches
Purity of oracle classification: 0.971
Entropy of oracle classification: 0.191
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 34 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(10)690_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 690), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)690_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 803
Number of entity ID pairs that occurred more than once: 0
Analyse set of 803 weight vectors
Containing 208 true matches and 595 true non-matches
(25.90% true matches)
Identified 756 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 721 (95.37%)
2 : 32 (4.23%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 756 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 574
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 802
Number of unique weight vectors: 756
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(756, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 756 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 756 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 671 weight vectors
Based on 26 matches and 59 non-matches
Classified 138 matches and 533 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(138, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(533, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 533 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 533 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 13 matches and 58 non-matches
Purity of oracle classification: 0.817
Entropy of oracle classification: 0.687
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)438_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (10, 1 - acm diverg, 438), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)438_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 621
Number of entity ID pairs that occurred more than once: 0
Analyse set of 621 weight vectors
Containing 160 true matches and 461 true non-matches
(25.76% true matches)
Identified 587 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 557 (94.89%)
2 : 27 (4.60%)
3 : 2 (0.34%)
4 : 1 (0.17%)
Identified 0 non-pure unique weight vectors (from 587 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 146
0.000 : 441
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 621
Number of unique weight vectors: 587
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(587, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 587 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 587 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 25 matches and 57 non-matches
Purity of oracle classification: 0.695
Entropy of oracle classification: 0.887
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 505 weight vectors
Based on 25 matches and 57 non-matches
Classified 97 matches and 408 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(97, 0.6951219512195121, 0.8871723027673717, 0.3048780487804878)
(408, 0.6951219512195121, 0.8871723027673717, 0.3048780487804878)
Current size of match and non-match training data sets: 25 / 57
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.89
- Size 408 weight vectors
- Estimated match proportion 0.305
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 408 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(20)752_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 752), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)752_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1086
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1086 weight vectors
Containing 220 true matches and 866 true non-matches
(20.26% true matches)
Identified 1030 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 994 (96.50%)
2 : 33 (3.20%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1030 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1085
Number of unique weight vectors: 1030
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1030, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1030 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1030 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 942 weight vectors
Based on 25 matches and 63 non-matches
Classified 125 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(125, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 125 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 125 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)480_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 480), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)480_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 421
Number of entity ID pairs that occurred more than once: 0
Analyse set of 421 weight vectors
Containing 184 true matches and 237 true non-matches
(43.71% true matches)
Identified 400 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 390 (97.50%)
2 : 7 (1.75%)
3 : 2 (0.50%)
11 : 1 (0.25%)
Identified 1 non-pure unique weight vectors (from 400 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 163
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 236
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 420
Number of unique weight vectors: 400
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(400, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 400 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 400 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 34 matches and 43 non-matches
Purity of oracle classification: 0.558
Entropy of oracle classification: 0.990
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 323 weight vectors
Based on 34 matches and 43 non-matches
Classified 123 matches and 200 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.5584415584415584, 0.9901226308935799, 0.44155844155844154)
(200, 0.5584415584415584, 0.9901226308935799, 0.44155844155844154)
Current size of match and non-match training data sets: 34 / 43
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 200 weight vectors
- Estimated match proportion 0.442
Sample size for this cluster: 64
Farthest first selection of 64 weight vectors from 200 vectors
The selected farthest weight vectors are:
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 9 matches and 55 non-matches
Purity of oracle classification: 0.859
Entropy of oracle classification: 0.586
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(20)37_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 37), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)37_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)687_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 687), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)687_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1068
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1068 weight vectors
Containing 226 true matches and 842 true non-matches
(21.16% true matches)
Identified 1011 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 974 (96.34%)
2 : 34 (3.36%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1011 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 821
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1067
Number of unique weight vectors: 1011
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1011, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1011 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1011 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 924 weight vectors
Based on 24 matches and 63 non-matches
Classified 131 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(793, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 131 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)990_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 990), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)990_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 799
Number of entity ID pairs that occurred more than once: 0
Analyse set of 799 weight vectors
Containing 224 true matches and 575 true non-matches
(28.04% true matches)
Identified 760 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 741 (97.50%)
2 : 16 (2.11%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 760 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 572
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 798
Number of unique weight vectors: 760
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(760, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 760 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 760 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 675 weight vectors
Based on 29 matches and 56 non-matches
Classified 149 matches and 526 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(526, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 526 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 526 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.538, 0.613, 0.789, 0.227, 0.857] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 7 matches and 67 non-matches
Purity of oracle classification: 0.905
Entropy of oracle classification: 0.452
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)965_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 965), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)965_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 213 true matches and 595 true non-matches
(26.36% true matches)
Identified 754 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 719 (95.36%)
2 : 32 (4.24%)
3 : 2 (0.27%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 754 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 574
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 754
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(754, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 754 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 754 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 669 weight vectors
Based on 27 matches and 58 non-matches
Classified 142 matches and 527 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(527, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 142 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)372_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 372), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)372_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 831
Number of entity ID pairs that occurred more than once: 0
Analyse set of 831 weight vectors
Containing 227 true matches and 604 true non-matches
(27.32% true matches)
Identified 774 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 737 (95.22%)
2 : 34 (4.39%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 774 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 583
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 830
Number of unique weight vectors: 774
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(774, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 774 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 774 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 689 weight vectors
Based on 28 matches and 57 non-matches
Classified 151 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(538, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 538 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 538 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 9 matches and 64 non-matches
Purity of oracle classification: 0.877
Entropy of oracle classification: 0.539
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)928_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984375
recall 0.210702
f-measure 0.347107
da 64
dm 0
ndm 0
tp 63
fp 1
tn 4.76529e+07
fn 236
Name: (10, 1 - acm diverg, 928), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)928_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 696
Number of entity ID pairs that occurred more than once: 0
Analyse set of 696 weight vectors
Containing 198 true matches and 498 true non-matches
(28.45% true matches)
Identified 664 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 648 (97.59%)
2 : 13 (1.96%)
3 : 2 (0.30%)
16 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 664 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 168
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 495
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 695
Number of unique weight vectors: 664
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(664, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 664 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 664 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 35 matches and 49 non-matches
Purity of oracle classification: 0.583
Entropy of oracle classification: 0.980
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 580 weight vectors
Based on 35 matches and 49 non-matches
Classified 270 matches and 310 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(270, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
(310, 0.5833333333333334, 0.9798687566511527, 0.4166666666666667)
Current size of match and non-match training data sets: 35 / 49
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 270 weight vectors
- Estimated match proportion 0.417
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 270 vectors
The selected farthest weight vectors are:
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 45 matches and 24 non-matches
Purity of oracle classification: 0.652
Entropy of oracle classification: 0.932
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 24
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
64.0
Analisando o arquivo: diverg(15)196_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (15, 1 - acm diverg, 196), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)196_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 759
Number of entity ID pairs that occurred more than once: 0
Analyse set of 759 weight vectors
Containing 185 true matches and 574 true non-matches
(24.37% true matches)
Identified 735 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 722 (98.23%)
2 : 10 (1.36%)
3 : 2 (0.27%)
11 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 735 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 163
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 571
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 758
Number of unique weight vectors: 735
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(735, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 735 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 735 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 650 weight vectors
Based on 30 matches and 55 non-matches
Classified 126 matches and 524 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(126, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(524, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 524 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 524 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.522, 0.786, 0.800, 0.824, 0.667] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 6 matches and 69 non-matches
Purity of oracle classification: 0.920
Entropy of oracle classification: 0.402
Number of true matches: 6
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(15)607_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 607), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)607_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 837
Number of entity ID pairs that occurred more than once: 0
Analyse set of 837 weight vectors
Containing 220 true matches and 617 true non-matches
(26.28% true matches)
Identified 781 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 745 (95.39%)
2 : 33 (4.23%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 781 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 596
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 836
Number of unique weight vectors: 781
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(781, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 781 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 781 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 696 weight vectors
Based on 28 matches and 57 non-matches
Classified 154 matches and 542 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(154, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(542, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 154 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 154 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 47 matches and 8 non-matches
Purity of oracle classification: 0.855
Entropy of oracle classification: 0.598
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)974_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 974), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)974_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 781
Number of entity ID pairs that occurred more than once: 0
Analyse set of 781 weight vectors
Containing 206 true matches and 575 true non-matches
(26.38% true matches)
Identified 752 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 735 (97.74%)
2 : 14 (1.86%)
3 : 2 (0.27%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 752 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 572
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 780
Number of unique weight vectors: 752
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(752, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 752 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 752 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.429, 0.786, 0.750, 0.389, 0.857] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 667 weight vectors
Based on 29 matches and 56 non-matches
Classified 137 matches and 530 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(137, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(530, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 137 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 137 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 51 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.232
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)535_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990291
recall 0.341137
f-measure 0.507463
da 103
dm 0
ndm 0
tp 102
fp 1
tn 4.76529e+07
fn 197
Name: (10, 1 - acm diverg, 535), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)535_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 563
Number of entity ID pairs that occurred more than once: 0
Analyse set of 563 weight vectors
Containing 147 true matches and 416 true non-matches
(26.11% true matches)
Identified 546 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 537 (98.35%)
2 : 6 (1.10%)
3 : 2 (0.37%)
8 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 546 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 132
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 413
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 555
Number of unique weight vectors: 545
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(545, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 545 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 545 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.538, 0.789, 0.353, 0.545, 0.550] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 31 matches and 50 non-matches
Purity of oracle classification: 0.617
Entropy of oracle classification: 0.960
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 464 weight vectors
Based on 31 matches and 50 non-matches
Classified 107 matches and 357 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(107, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
(357, 0.6172839506172839, 0.9599377175669783, 0.38271604938271603)
Current size of match and non-match training data sets: 31 / 50
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 357 weight vectors
- Estimated match proportion 0.383
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 357 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[1.000, 0.000, 0.565, 0.667, 0.600, 0.412, 0.381] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.778, 0.429, 0.571, 0.750, 0.600] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.889, 0.000, 0.714, 0.700, 0.500, 0.636, 0.765] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 0.000, 0.750, 0.905, 0.667, 0.500, 0.571] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 1 matches and 71 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.106
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
103.0
Analisando o arquivo: diverg(15)436_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 436), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)436_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 407
Number of entity ID pairs that occurred more than once: 0
Analyse set of 407 weight vectors
Containing 217 true matches and 190 true non-matches
(53.32% true matches)
Identified 370 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 352 (95.14%)
2 : 15 (4.05%)
3 : 2 (0.54%)
19 : 1 (0.27%)
Identified 1 non-pure unique weight vectors (from 370 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 187
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 406
Number of unique weight vectors: 370
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(370, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 370 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 76
Perform initial selection using "far" method
Farthest first selection of 76 weight vectors from 370 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 29 matches and 47 non-matches
Purity of oracle classification: 0.618
Entropy of oracle classification: 0.959
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 47
Number of false non-matches: 0
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 294 weight vectors
Based on 29 matches and 47 non-matches
Classified 145 matches and 149 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 76
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.618421052631579, 0.959149554396894, 0.3815789473684211)
(149, 0.618421052631579, 0.959149554396894, 0.3815789473684211)
Current size of match and non-match training data sets: 29 / 47
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 149 weight vectors
- Estimated match proportion 0.382
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 9 matches and 48 non-matches
Purity of oracle classification: 0.842
Entropy of oracle classification: 0.629
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)318_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 318), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)318_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 30 matches and 58 non-matches
Classified 179 matches and 760 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(179, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(760, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 179 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 179 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 43 matches and 15 non-matches
Purity of oracle classification: 0.741
Entropy of oracle classification: 0.825
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 15
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)710_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 710), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)710_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 712
Number of entity ID pairs that occurred more than once: 0
Analyse set of 712 weight vectors
Containing 201 true matches and 511 true non-matches
(28.23% true matches)
Identified 667 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 633 (94.90%)
2 : 31 (4.65%)
3 : 2 (0.30%)
11 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 667 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 490
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 711
Number of unique weight vectors: 667
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(667, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 667 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 667 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 26 matches and 58 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.893
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 583 weight vectors
Based on 26 matches and 58 non-matches
Classified 123 matches and 460 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
(460, 0.6904761904761905, 0.8926230133850986, 0.30952380952380953)
Current size of match and non-match training data sets: 26 / 58
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 123 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)499_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 499), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)499_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)800_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 800), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)800_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.05 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)406_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 406), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)406_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 469
Number of entity ID pairs that occurred more than once: 0
Analyse set of 469 weight vectors
Containing 167 true matches and 302 true non-matches
(35.61% true matches)
Identified 452 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 441 (97.57%)
2 : 8 (1.77%)
3 : 2 (0.44%)
6 : 1 (0.22%)
Identified 0 non-pure unique weight vectors (from 452 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.000 : 302
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 469
Number of unique weight vectors: 452
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(452, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 452 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 79
Perform initial selection using "far" method
Farthest first selection of 79 weight vectors from 452 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.462, 0.609, 0.643, 0.706, 0.786] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.500, 0.826, 0.429, 0.538, 0.636] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 79 weight vectors
The oracle will correctly classify 79 weight vectors and wrongly classify 0
Classified 26 matches and 53 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 79 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 373 weight vectors
Based on 26 matches and 53 non-matches
Classified 131 matches and 242 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 79
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.6708860759493671, 0.9140185106642176, 0.3291139240506329)
(242, 0.6708860759493671, 0.9140185106642176, 0.3291139240506329)
Current size of match and non-match training data sets: 26 / 53
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 242 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 63
Farthest first selection of 63 weight vectors from 242 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.818, 0.762, 0.714, 0.500, 0.400] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[0.770, 0.000, 0.737, 0.667, 0.261, 0.533, 0.391] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.815, 0.643, 0.800, 0.750, 0.429] (False)
[1.000, 0.000, 0.423, 0.609, 0.857, 0.361, 0.688] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.571, 0.867, 0.471, 0.583, 0.643] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.533, 0.000, 0.667, 0.643, 0.500, 0.529, 0.435] (False)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.611, 0.000, 0.800, 0.684, 0.500, 0.778, 0.609] (False)
[1.000, 0.000, 0.318, 0.581, 0.526, 0.250, 0.571] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.682, 0.667, 0.286, 0.700, 0.533] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 63 weight vectors
The oracle will correctly classify 63 weight vectors and wrongly classify 0
Classified 0 matches and 63 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 63 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(10)125_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 125), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)125_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 602
Number of entity ID pairs that occurred more than once: 0
Analyse set of 602 weight vectors
Containing 172 true matches and 430 true non-matches
(28.57% true matches)
Identified 582 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 571 (98.11%)
2 : 8 (1.37%)
3 : 2 (0.34%)
9 : 1 (0.17%)
Identified 1 non-pure unique weight vectors (from 582 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 154
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 427
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 593
Number of unique weight vectors: 581
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(581, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 581 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 581 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 29 matches and 53 non-matches
Purity of oracle classification: 0.646
Entropy of oracle classification: 0.937
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 499 weight vectors
Based on 29 matches and 53 non-matches
Classified 126 matches and 373 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(126, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
(373, 0.6463414634146342, 0.9372930661967527, 0.35365853658536583)
Current size of match and non-match training data sets: 29 / 53
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 126 weight vectors
- Estimated match proportion 0.354
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 126 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 45 matches and 7 non-matches
Purity of oracle classification: 0.865
Entropy of oracle classification: 0.570
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(10)879_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984615
recall 0.214047
f-measure 0.351648
da 65
dm 0
ndm 0
tp 64
fp 1
tn 4.76529e+07
fn 235
Name: (10, 1 - acm diverg, 879), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)879_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 522
Number of entity ID pairs that occurred more than once: 0
Analyse set of 522 weight vectors
Containing 188 true matches and 334 true non-matches
(36.02% true matches)
Identified 496 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 484 (97.58%)
2 : 9 (1.81%)
3 : 2 (0.40%)
14 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 496 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 162
0.929 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 333
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 521
Number of unique weight vectors: 496
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(496, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 496 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 496 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 31 matches and 49 non-matches
Purity of oracle classification: 0.613
Entropy of oracle classification: 0.963
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 49
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 416 weight vectors
Based on 31 matches and 49 non-matches
Classified 133 matches and 283 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(133, 0.6125, 0.9631672450918832, 0.3875)
(283, 0.6125, 0.9631672450918832, 0.3875)
Current size of match and non-match training data sets: 31 / 49
Selected cluster with (queue ordering: random):
- Purity 0.61 and entropy 0.96
- Size 283 weight vectors
- Estimated match proportion 0.388
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 283 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[1.000, 0.000, 0.864, 0.667, 0.435, 0.700, 0.600] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.846, 0.857, 0.353, 0.318, 0.400] (False)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 0.296, 0.600, 0.471, 0.600, 0.643] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.222, 0.643, 0.800, 0.750, 0.692] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 4 matches and 65 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.319
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
65.0
Analisando o arquivo: diverg(20)62_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 62), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)62_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 24 matches and 64 non-matches
Purity of oracle classification: 0.727
Entropy of oracle classification: 0.845
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 24 matches and 64 non-matches
Classified 91 matches and 857 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(91, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
(857, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
Current size of match and non-match training data sets: 24 / 64
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.85
- Size 857 weight vectors
- Estimated match proportion 0.273
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 857 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 18 matches and 52 non-matches
Purity of oracle classification: 0.743
Entropy of oracle classification: 0.822
Number of true matches: 18
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)82_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 82), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)82_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 123 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)374_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 374), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)374_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 863
Number of entity ID pairs that occurred more than once: 0
Analyse set of 863 weight vectors
Containing 195 true matches and 668 true non-matches
(22.60% true matches)
Identified 811 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 775 (95.56%)
2 : 33 (4.07%)
3 : 2 (0.25%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 811 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 163
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 647
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 862
Number of unique weight vectors: 811
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(811, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 811 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 811 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 30 matches and 56 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.933
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 725 weight vectors
Based on 30 matches and 56 non-matches
Classified 153 matches and 572 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
(572, 0.6511627906976745, 0.9330252953592911, 0.3488372093023256)
Current size of match and non-match training data sets: 30 / 56
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 572 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 76
Farthest first selection of 76 weight vectors from 572 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.673, 0.000, 0.500, 0.737, 0.500, 0.818, 0.739] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.400, 0.583, 0.563] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.462, 0.889, 0.455, 0.211, 0.375] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
Perform oracle with 100.00 accuracy on 76 weight vectors
The oracle will correctly classify 76 weight vectors and wrongly classify 0
Classified 0 matches and 76 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 76
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 76 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)120_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 120), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)120_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1084
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1084 weight vectors
Containing 227 true matches and 857 true non-matches
(20.94% true matches)
Identified 1027 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 990 (96.40%)
2 : 34 (3.31%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1027 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 836
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1083
Number of unique weight vectors: 1027
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1027, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1027 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1027 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 30 matches and 58 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 939 weight vectors
Based on 30 matches and 58 non-matches
Classified 159 matches and 780 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(159, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
(780, 0.6590909090909091, 0.9256859869821299, 0.3409090909090909)
Current size of match and non-match training data sets: 30 / 58
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 159 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 159 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 48 matches and 8 non-matches
Purity of oracle classification: 0.857
Entropy of oracle classification: 0.592
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 8
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)785_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (15, 1 - acm diverg, 785), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)785_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 697
Number of entity ID pairs that occurred more than once: 0
Analyse set of 697 weight vectors
Containing 169 true matches and 528 true non-matches
(24.25% true matches)
Identified 678 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 665 (98.08%)
2 : 10 (1.47%)
3 : 2 (0.29%)
6 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 678 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 152
0.000 : 526
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 697
Number of unique weight vectors: 678
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(678, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 678 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 678 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 594 weight vectors
Based on 29 matches and 55 non-matches
Classified 111 matches and 483 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(111, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(483, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 483 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 483 vectors
The selected farthest weight vectors are:
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(15)16_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (15, 1 - acm diverg, 16), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)16_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 869
Number of entity ID pairs that occurred more than once: 0
Analyse set of 869 weight vectors
Containing 190 true matches and 679 true non-matches
(21.86% true matches)
Identified 829 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 795 (95.90%)
2 : 31 (3.74%)
3 : 2 (0.24%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 829 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 170
0.000 : 659
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 869
Number of unique weight vectors: 829
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(829, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 829 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 829 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 27 matches and 59 non-matches
Purity of oracle classification: 0.686
Entropy of oracle classification: 0.898
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 743 weight vectors
Based on 27 matches and 59 non-matches
Classified 126 matches and 617 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(126, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
(617, 0.686046511627907, 0.8976844934141643, 0.313953488372093)
Current size of match and non-match training data sets: 27 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 126 weight vectors
- Estimated match proportion 0.314
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 126 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 49 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.141
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(10)972_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 972), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)972_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 870
Number of entity ID pairs that occurred more than once: 0
Analyse set of 870 weight vectors
Containing 175 true matches and 695 true non-matches
(20.11% true matches)
Identified 831 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 801 (96.39%)
2 : 27 (3.25%)
3 : 2 (0.24%)
9 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 831 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 156
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 674
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 830
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(830, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 830 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 830 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 31 matches and 55 non-matches
Purity of oracle classification: 0.640
Entropy of oracle classification: 0.943
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 744 weight vectors
Based on 31 matches and 55 non-matches
Classified 175 matches and 569 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(175, 0.6395348837209303, 0.9430685934712908, 0.36046511627906974)
(569, 0.6395348837209303, 0.9430685934712908, 0.36046511627906974)
Current size of match and non-match training data sets: 31 / 55
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 175 weight vectors
- Estimated match proportion 0.360
Sample size for this cluster: 59
Farthest first selection of 59 weight vectors from 175 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.467, 1.000, 0.231, 0.304, 0.250, 0.115, 0.000] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.619, 1.000, 0.103, 0.163, 0.129, 0.146, 0.213] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 59 weight vectors
The oracle will correctly classify 59 weight vectors and wrongly classify 0
Classified 39 matches and 20 non-matches
Purity of oracle classification: 0.661
Entropy of oracle classification: 0.924
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 20
Number of false non-matches: 0
Deleted 59 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(20)734_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 734), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)734_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1052
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1052 weight vectors
Containing 223 true matches and 829 true non-matches
(21.20% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 808
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1051
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 26 matches and 61 non-matches
Classified 118 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(118, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 793 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 793 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.736, 1.000, 0.250, 0.290, 0.172, 0.188, 0.286] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 14 matches and 59 non-matches
Purity of oracle classification: 0.808
Entropy of oracle classification: 0.705
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(15)737_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 737), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)737_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 644
Number of entity ID pairs that occurred more than once: 0
Analyse set of 644 weight vectors
Containing 212 true matches and 432 true non-matches
(32.92% true matches)
Identified 608 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 591 (97.20%)
2 : 14 (2.30%)
3 : 2 (0.33%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 608 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 429
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 643
Number of unique weight vectors: 608
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(608, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 608 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 608 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 525 weight vectors
Based on 28 matches and 55 non-matches
Classified 152 matches and 373 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(152, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(373, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 152 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 152 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 49 matches and 6 non-matches
Purity of oracle classification: 0.891
Entropy of oracle classification: 0.497
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)749_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 749), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)749_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 732
Number of entity ID pairs that occurred more than once: 0
Analyse set of 732 weight vectors
Containing 219 true matches and 513 true non-matches
(29.92% true matches)
Identified 677 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 641 (94.68%)
2 : 33 (4.87%)
3 : 2 (0.30%)
19 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 677 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 492
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 731
Number of unique weight vectors: 677
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(677, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 677 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 677 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 593 weight vectors
Based on 27 matches and 57 non-matches
Classified 148 matches and 445 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(445, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 445 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 445 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 8 matches and 62 non-matches
Purity of oracle classification: 0.886
Entropy of oracle classification: 0.513
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(15)635_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 635), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)635_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 0 matches and 829 non-matches
40.0
Analisando o arquivo: diverg(20)199_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (20, 1 - acm diverg, 199), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)199_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1053
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1053 weight vectors
Containing 187 true matches and 866 true non-matches
(17.76% true matches)
Identified 1011 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 980 (96.93%)
2 : 28 (2.77%)
3 : 2 (0.20%)
11 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1011 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1052
Number of unique weight vectors: 1011
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1011, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1011 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1011 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 924 weight vectors
Based on 24 matches and 63 non-matches
Classified 91 matches and 833 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(91, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(833, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 91 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 42
Farthest first selection of 42 weight vectors from 91 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 42 weight vectors
The oracle will correctly classify 42 weight vectors and wrongly classify 0
Classified 42 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 42 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(20)901_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 901), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)901_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1100
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1100 weight vectors
Containing 227 true matches and 873 true non-matches
(20.64% true matches)
Identified 1043 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1006 (96.45%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1043 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1099
Number of unique weight vectors: 1043
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1043, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1043 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1043 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 955 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 846 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(846, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)476_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981481
recall 0.177258
f-measure 0.300283
da 54
dm 0
ndm 0
tp 53
fp 1
tn 4.76529e+07
fn 246
Name: (10, 1 - acm diverg, 476), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)476_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1030
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1030 weight vectors
Containing 210 true matches and 820 true non-matches
(20.39% true matches)
Identified 976 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 941 (96.41%)
2 : 32 (3.28%)
3 : 2 (0.20%)
19 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 976 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 799
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1029
Number of unique weight vectors: 976
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(976, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 976 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 976 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 889 weight vectors
Based on 32 matches and 55 non-matches
Classified 317 matches and 572 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(317, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(572, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 572 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 572 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.500, 0.714, 0.800, 0.571, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
54.0
Analisando o arquivo: diverg(15)318_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (15, 1 - acm diverg, 318), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)318_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 907
Number of entity ID pairs that occurred more than once: 0
Analyse set of 907 weight vectors
Containing 204 true matches and 703 true non-matches
(22.49% true matches)
Identified 858 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 824 (96.04%)
2 : 31 (3.61%)
3 : 2 (0.23%)
15 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 858 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 175
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 682
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 906
Number of unique weight vectors: 858
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(858, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 858 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 858 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 772 weight vectors
Based on 28 matches and 58 non-matches
Classified 149 matches and 623 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(149, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(623, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 149 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 149 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(20)813_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 813), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)813_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.429, 0.786, 0.750, 0.389, 0.857] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 30 matches and 55 non-matches
Classified 147 matches and 537 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(147, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(537, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 147 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 147 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 53 matches and 2 non-matches
Purity of oracle classification: 0.964
Entropy of oracle classification: 0.225
Number of true matches: 53
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)119_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 119), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)119_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 817 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 817 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 11 matches and 60 non-matches
Purity of oracle classification: 0.845
Entropy of oracle classification: 0.622
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)229_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 229), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)229_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 453
Number of entity ID pairs that occurred more than once: 0
Analyse set of 453 weight vectors
Containing 218 true matches and 235 true non-matches
(48.12% true matches)
Identified 417 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 398 (95.44%)
2 : 16 (3.84%)
3 : 2 (0.48%)
17 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 417 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 232
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 452
Number of unique weight vectors: 417
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(417, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 417 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 417 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 37 matches and 41 non-matches
Purity of oracle classification: 0.526
Entropy of oracle classification: 0.998
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 339 weight vectors
Based on 37 matches and 41 non-matches
Classified 278 matches and 61 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(278, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
(61, 0.5256410256410257, 0.9981021327390103, 0.47435897435897434)
Current size of match and non-match training data sets: 37 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 61 weight vectors
- Estimated match proportion 0.474
Sample size for this cluster: 38
Farthest first selection of 38 weight vectors from 61 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 0.000, 0.818, 0.727, 0.438, 0.375, 0.400] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
Perform oracle with 100.00 accuracy on 38 weight vectors
The oracle will correctly classify 38 weight vectors and wrongly classify 0
Classified 0 matches and 38 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 38 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)52_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 52), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)52_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 855
Number of entity ID pairs that occurred more than once: 0
Analyse set of 855 weight vectors
Containing 221 true matches and 634 true non-matches
(25.85% true matches)
Identified 799 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 763 (95.49%)
2 : 33 (4.13%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 799 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 854
Number of unique weight vectors: 799
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(799, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 799 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 799 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 714 weight vectors
Based on 28 matches and 57 non-matches
Classified 150 matches and 564 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(564, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 150 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 150 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)785_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 785), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)785_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1069
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1069 weight vectors
Containing 221 true matches and 848 true non-matches
(20.67% true matches)
Identified 1013 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 977 (96.45%)
2 : 33 (3.26%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1013 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 827
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1068
Number of unique weight vectors: 1013
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1013, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1013 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1013 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 926 weight vectors
Based on 23 matches and 64 non-matches
Classified 106 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(106, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)699_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 699), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)699_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 731
Number of entity ID pairs that occurred more than once: 0
Analyse set of 731 weight vectors
Containing 210 true matches and 521 true non-matches
(28.73% true matches)
Identified 697 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 680 (97.56%)
2 : 14 (2.01%)
3 : 2 (0.29%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 697 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 518
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 730
Number of unique weight vectors: 697
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(697, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 697 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 697 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 613 weight vectors
Based on 29 matches and 55 non-matches
Classified 142 matches and 471 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(471, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 142 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)392_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 392), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)392_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 956 non-matches
39.0
Analisando o arquivo: diverg(10)482_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.982143
recall 0.183946
f-measure 0.309859
da 56
dm 0
ndm 0
tp 55
fp 1
tn 4.76529e+07
fn 244
Name: (10, 1 - acm diverg, 482), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)482_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 435
Number of entity ID pairs that occurred more than once: 0
Analyse set of 435 weight vectors
Containing 205 true matches and 230 true non-matches
(47.13% true matches)
Identified 403 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 389 (96.53%)
2 : 11 (2.73%)
3 : 2 (0.50%)
18 : 1 (0.25%)
Identified 1 non-pure unique weight vectors (from 403 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 229
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 434
Number of unique weight vectors: 403
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(403, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 403 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 403 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 36 matches and 41 non-matches
Purity of oracle classification: 0.532
Entropy of oracle classification: 0.997
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 326 weight vectors
Based on 36 matches and 41 non-matches
Classified 136 matches and 190 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(136, 0.5324675324675324, 0.9969562518473083, 0.4675324675324675)
(190, 0.5324675324675324, 0.9969562518473083, 0.4675324675324675)
Current size of match and non-match training data sets: 36 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 136 weight vectors
- Estimated match proportion 0.468
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 136 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 49 matches and 7 non-matches
Purity of oracle classification: 0.875
Entropy of oracle classification: 0.544
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
56.0
Analisando o arquivo: diverg(15)659_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (15, 1 - acm diverg, 659), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)659_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1092
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1092 weight vectors
Containing 221 true matches and 871 true non-matches
(20.24% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1000 (96.53%)
2 : 33 (3.19%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 850
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1091
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 103 matches and 845 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(845, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 845 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 845 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)349_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 349), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)349_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 332
Number of entity ID pairs that occurred more than once: 0
Analyse set of 332 weight vectors
Containing 178 true matches and 154 true non-matches
(53.61% true matches)
Identified 300 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 285 (95.00%)
2 : 12 (4.00%)
3 : 2 (0.67%)
17 : 1 (0.33%)
Identified 1 non-pure unique weight vectors (from 300 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 148
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 151
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 331
Number of unique weight vectors: 300
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(300, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 300 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 73
Perform initial selection using "far" method
Farthest first selection of 73 weight vectors from 300 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.857, 0.571, 0.556, 0.556, 0.722] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 32 matches and 41 non-matches
Purity of oracle classification: 0.562
Entropy of oracle classification: 0.989
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 227 weight vectors
Based on 32 matches and 41 non-matches
Classified 145 matches and 82 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 73
Size, purity, entropy, and estimated match proportion of clusters in queue:
(145, 0.5616438356164384, 0.9890076795739704, 0.4383561643835616)
(82, 0.5616438356164384, 0.9890076795739704, 0.4383561643835616)
Current size of match and non-match training data sets: 32 / 41
Selected cluster with (queue ordering: random):
- Purity 0.56 and entropy 0.99
- Size 82 weight vectors
- Estimated match proportion 0.438
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 82 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.456, 1.000, 0.087, 0.208, 0.125, 0.152, 0.061] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.750, 0.000, 0.206, 0.391, 0.351, 0.261, 0.146] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 1 matches and 43 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 43
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)503_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 503), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)503_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1027
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1027 weight vectors
Containing 223 true matches and 804 true non-matches
(21.71% true matches)
Identified 973 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 936 (96.20%)
2 : 34 (3.49%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 973 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 783
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 973
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(973, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 973 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 973 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 886 weight vectors
Based on 26 matches and 61 non-matches
Classified 131 matches and 755 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(755, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 131 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 50
Farthest first selection of 50 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 50 weight vectors
The oracle will correctly classify 50 weight vectors and wrongly classify 0
Classified 49 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.141
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 50 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)288_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 288), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)288_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 26 matches and 62 non-matches
Purity of oracle classification: 0.705
Entropy of oracle classification: 0.876
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 26 matches and 62 non-matches
Classified 119 matches and 829 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(119, 0.7045454545454546, 0.8756633923230397, 0.29545454545454547)
(829, 0.7045454545454546, 0.8756633923230397, 0.29545454545454547)
Current size of match and non-match training data sets: 26 / 62
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 119 weight vectors
- Estimated match proportion 0.295
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 119 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)463_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 463), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)463_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 664
Number of entity ID pairs that occurred more than once: 0
Analyse set of 664 weight vectors
Containing 212 true matches and 452 true non-matches
(31.93% true matches)
Identified 612 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 576 (94.12%)
2 : 33 (5.39%)
3 : 2 (0.33%)
16 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 612 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 431
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 663
Number of unique weight vectors: 612
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(612, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 612 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 612 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 529 weight vectors
Based on 28 matches and 55 non-matches
Classified 155 matches and 374 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(155, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(374, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 374 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 374 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.917, 0.786, 0.263, 0.500, 0.556] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.767, 0.667, 0.545, 0.786, 0.773] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.857, 0.444, 0.556, 0.235, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 2 matches and 68 non-matches
Purity of oracle classification: 0.971
Entropy of oracle classification: 0.187
Number of true matches: 2
Number of false matches: 0
Number of true non-matches: 68
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(20)383_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 383), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)383_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1069
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1069 weight vectors
Containing 221 true matches and 848 true non-matches
(20.67% true matches)
Identified 1013 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 977 (96.45%)
2 : 33 (3.26%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1013 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 827
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1068
Number of unique weight vectors: 1013
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1013, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1013 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1013 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 926 weight vectors
Based on 23 matches and 64 non-matches
Classified 106 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(106, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)873_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 873), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)873_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 751
Number of entity ID pairs that occurred more than once: 0
Analyse set of 751 weight vectors
Containing 204 true matches and 547 true non-matches
(27.16% true matches)
Identified 713 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 695 (97.48%)
2 : 15 (2.10%)
3 : 2 (0.28%)
20 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 713 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 168
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 544
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 750
Number of unique weight vectors: 713
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(713, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 713 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 713 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 32 matches and 52 non-matches
Purity of oracle classification: 0.619
Entropy of oracle classification: 0.959
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 629 weight vectors
Based on 32 matches and 52 non-matches
Classified 184 matches and 445 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(184, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
(445, 0.6190476190476191, 0.9587118829771318, 0.38095238095238093)
Current size of match and non-match training data sets: 32 / 52
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 184 weight vectors
- Estimated match proportion 0.381
Sample size for this cluster: 61
Farthest first selection of 61 weight vectors from 184 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.817, 1.000, 0.250, 0.212, 0.256, 0.045, 0.250] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.780, 1.000, 0.271, 0.152, 0.137, 0.250, 0.167] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
Perform oracle with 100.00 accuracy on 61 weight vectors
The oracle will correctly classify 61 weight vectors and wrongly classify 0
Classified 48 matches and 13 non-matches
Purity of oracle classification: 0.787
Entropy of oracle classification: 0.747
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 13
Number of false non-matches: 0
Deleted 61 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)273_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 273), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)273_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 999
Number of entity ID pairs that occurred more than once: 0
Analyse set of 999 weight vectors
Containing 186 true matches and 813 true non-matches
(18.62% true matches)
Identified 957 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 926 (96.76%)
2 : 28 (2.93%)
3 : 2 (0.21%)
11 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 957 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 164
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 792
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 998
Number of unique weight vectors: 957
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(957, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 957 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 957 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 870 weight vectors
Based on 31 matches and 56 non-matches
Classified 301 matches and 569 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(301, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(569, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 301 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 301 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 41 matches and 27 non-matches
Purity of oracle classification: 0.603
Entropy of oracle classification: 0.969
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 27
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(20)651_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 651), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)651_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)170_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (15, 1 - acm diverg, 170), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)170_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 793
Number of entity ID pairs that occurred more than once: 0
Analyse set of 793 weight vectors
Containing 187 true matches and 606 true non-matches
(23.58% true matches)
Identified 751 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 720 (95.87%)
2 : 28 (3.73%)
3 : 2 (0.27%)
11 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 751 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 585
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 792
Number of unique weight vectors: 751
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(751, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 751 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 751 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 666 weight vectors
Based on 30 matches and 55 non-matches
Classified 308 matches and 358 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(308, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(358, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 358 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 358 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.700, 0.429, 0.476, 0.647, 0.810] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.370, 0.818, 0.800, 0.550, 0.500] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.741, 0.556, 0.667, 0.350, 0.556] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.667, 0.500, 0.524, 0.786, 0.524] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.857, 0.875, 0.625, 0.333, 0.667] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 0 matches and 70 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(20)336_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (20, 1 - acm diverg, 336), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)336_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 908
Number of entity ID pairs that occurred more than once: 0
Analyse set of 908 weight vectors
Containing 212 true matches and 696 true non-matches
(23.35% true matches)
Identified 856 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 820 (95.79%)
2 : 33 (3.86%)
3 : 2 (0.23%)
16 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 856 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 675
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 907
Number of unique weight vectors: 856
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(856, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 856 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 856 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 29 matches and 57 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 770 weight vectors
Based on 29 matches and 57 non-matches
Classified 165 matches and 605 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(165, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
(605, 0.6627906976744186, 0.9221231306777973, 0.3372093023255814)
Current size of match and non-match training data sets: 29 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 165 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 57
Farthest first selection of 57 weight vectors from 165 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 57 weight vectors
The oracle will correctly classify 57 weight vectors and wrongly classify 0
Classified 46 matches and 11 non-matches
Purity of oracle classification: 0.807
Entropy of oracle classification: 0.708
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 11
Number of false non-matches: 0
Deleted 57 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(15)363_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (15, 1 - acm diverg, 363), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)363_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 863
Number of entity ID pairs that occurred more than once: 0
Analyse set of 863 weight vectors
Containing 160 true matches and 703 true non-matches
(18.54% true matches)
Identified 829 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 799 (96.38%)
2 : 27 (3.26%)
3 : 2 (0.24%)
4 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 829 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 146
0.000 : 683
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 863
Number of unique weight vectors: 829
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(829, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 829 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 829 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 26 matches and 60 non-matches
Purity of oracle classification: 0.698
Entropy of oracle classification: 0.884
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 743 weight vectors
Based on 26 matches and 60 non-matches
Classified 94 matches and 649 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(94, 0.6976744186046512, 0.8841151220488478, 0.3023255813953488)
(649, 0.6976744186046512, 0.8841151220488478, 0.3023255813953488)
Current size of match and non-match training data sets: 26 / 60
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 649 weight vectors
- Estimated match proportion 0.302
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 649 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 11 matches and 61 non-matches
Purity of oracle classification: 0.847
Entropy of oracle classification: 0.617
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(20)404_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 404), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)404_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1086
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1086 weight vectors
Containing 220 true matches and 866 true non-matches
(20.26% true matches)
Identified 1030 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 994 (96.50%)
2 : 33 (3.20%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1030 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1085
Number of unique weight vectors: 1030
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1030, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1030 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1030 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 24 matches and 64 non-matches
Purity of oracle classification: 0.727
Entropy of oracle classification: 0.845
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 942 weight vectors
Based on 24 matches and 64 non-matches
Classified 86 matches and 856 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(86, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
(856, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
Current size of match and non-match training data sets: 24 / 64
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.85
- Size 86 weight vectors
- Estimated match proportion 0.273
Sample size for this cluster: 41
Farthest first selection of 41 weight vectors from 86 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 0.950, 0.923, 0.941] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 41 weight vectors
The oracle will correctly classify 41 weight vectors and wrongly classify 0
Classified 41 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 41 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)960_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 960), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)960_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 729
Number of entity ID pairs that occurred more than once: 0
Analyse set of 729 weight vectors
Containing 210 true matches and 519 true non-matches
(28.81% true matches)
Identified 695 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 678 (97.55%)
2 : 14 (2.01%)
3 : 2 (0.29%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 695 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 516
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 728
Number of unique weight vectors: 695
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(695, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 695 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 695 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 611 weight vectors
Based on 31 matches and 53 non-matches
Classified 142 matches and 469 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(469, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 142 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 51 matches and 4 non-matches
Purity of oracle classification: 0.927
Entropy of oracle classification: 0.376
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)185_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 185), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)185_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1061
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1061 weight vectors
Containing 225 true matches and 836 true non-matches
(21.21% true matches)
Identified 1004 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 967 (96.31%)
2 : 34 (3.39%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1004 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 815
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1060
Number of unique weight vectors: 1004
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1004, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1004 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1004 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 917 weight vectors
Based on 24 matches and 63 non-matches
Classified 130 matches and 787 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(130, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(787, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 130 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 130 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.956, 1.000, 1.000, 1.000, 0.966, 1.000, 0.971] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)530_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (15, 1 - acm diverg, 530), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)530_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 927
Number of entity ID pairs that occurred more than once: 0
Analyse set of 927 weight vectors
Containing 178 true matches and 749 true non-matches
(19.20% true matches)
Identified 888 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 858 (96.62%)
2 : 27 (3.04%)
3 : 2 (0.23%)
9 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 888 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 728
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 918
Number of unique weight vectors: 887
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(887, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 887 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 887 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 23 matches and 63 non-matches
Purity of oracle classification: 0.733
Entropy of oracle classification: 0.838
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 801 weight vectors
Based on 23 matches and 63 non-matches
Classified 89 matches and 712 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(89, 0.7325581395348837, 0.837769869006679, 0.26744186046511625)
(712, 0.7325581395348837, 0.837769869006679, 0.26744186046511625)
Current size of match and non-match training data sets: 23 / 63
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.84
- Size 712 weight vectors
- Estimated match proportion 0.267
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 712 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(20)6_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 6), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)6_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 566 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)305_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 305), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)305_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 987
Number of entity ID pairs that occurred more than once: 0
Analyse set of 987 weight vectors
Containing 212 true matches and 775 true non-matches
(21.48% true matches)
Identified 935 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 900 (96.26%)
2 : 32 (3.42%)
3 : 2 (0.21%)
17 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 935 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 754
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 986
Number of unique weight vectors: 935
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(935, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 935 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 935 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 848 weight vectors
Based on 32 matches and 55 non-matches
Classified 293 matches and 555 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(293, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(555, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 293 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 293 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 44 matches and 24 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 24
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)577_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 577), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)577_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 699
Number of entity ID pairs that occurred more than once: 0
Analyse set of 699 weight vectors
Containing 219 true matches and 480 true non-matches
(31.33% true matches)
Identified 644 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 608 (94.41%)
2 : 33 (5.12%)
3 : 2 (0.31%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 644 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 459
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 698
Number of unique weight vectors: 644
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(644, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 644 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 644 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 561 weight vectors
Based on 28 matches and 55 non-matches
Classified 156 matches and 405 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(405, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 156 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 156 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 51 matches and 5 non-matches
Purity of oracle classification: 0.911
Entropy of oracle classification: 0.434
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)272_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 272), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)272_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)858_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990385
recall 0.344482
f-measure 0.511166
da 104
dm 0
ndm 0
tp 103
fp 1
tn 4.76529e+07
fn 196
Name: (10, 1 - acm diverg, 858), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)858_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 579
Number of entity ID pairs that occurred more than once: 0
Analyse set of 579 weight vectors
Containing 149 true matches and 430 true non-matches
(25.73% true matches)
Identified 562 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 553 (98.40%)
2 : 6 (1.07%)
3 : 2 (0.36%)
8 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 562 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 134
0.875 : 1 (all weight vectors with this pureness to be removed)
0.000 : 427
Removed 8 non-pure weight vectors
Final number of weight vectors to use: 571
Number of unique weight vectors: 561
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(561, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 561 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 561 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.667, 0.571, 0.563, 0.333, 0.867] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 27 matches and 55 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 479 weight vectors
Based on 27 matches and 55 non-matches
Classified 103 matches and 376 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(103, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
(376, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
Current size of match and non-match training data sets: 27 / 55
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 376 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 376 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.400, 0.737, 0.529, 0.750, 0.367] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.600, 0.700, 0.600, 0.611, 0.706] (False)
[1.000, 0.000, 0.296, 0.600, 0.471, 0.600, 0.643] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.615, 0.826, 0.286, 0.857, 0.643] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 4 matches and 65 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.319
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
104.0
Analisando o arquivo: diverg(20)173_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 173), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)173_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1059
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1059 weight vectors
Containing 227 true matches and 832 true non-matches
(21.44% true matches)
Identified 1002 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 965 (96.31%)
2 : 34 (3.39%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1002 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 811
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1058
Number of unique weight vectors: 1002
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1002, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1002 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1002 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 915 weight vectors
Based on 30 matches and 57 non-matches
Classified 177 matches and 738 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(177, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(738, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 738 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 738 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.667, 0.538, 0.455, 0.581, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)323_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 323), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)323_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)782_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (10, 1 - acm diverg, 782), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)782_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 291
Number of entity ID pairs that occurred more than once: 0
Analyse set of 291 weight vectors
Containing 199 true matches and 92 true non-matches
(68.38% true matches)
Identified 259 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 245 (94.59%)
2 : 11 (4.25%)
3 : 2 (0.77%)
18 : 1 (0.39%)
Identified 1 non-pure unique weight vectors (from 259 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 167
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 91
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 290
Number of unique weight vectors: 259
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(259, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 259 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 70
Perform initial selection using "far" method
Farthest first selection of 70 weight vectors from 259 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 32 matches and 38 non-matches
Purity of oracle classification: 0.543
Entropy of oracle classification: 0.995
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 38
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 189 weight vectors
Based on 32 matches and 38 non-matches
Classified 137 matches and 52 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 70
Size, purity, entropy, and estimated match proportion of clusters in queue:
(137, 0.5428571428571428, 0.9946937953613058, 0.45714285714285713)
(52, 0.5428571428571428, 0.9946937953613058, 0.45714285714285713)
Current size of match and non-match training data sets: 32 / 38
Selected cluster with (queue ordering: random):
- Purity 0.54 and entropy 0.99
- Size 137 weight vectors
- Estimated match proportion 0.457
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 137 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 52 matches and 4 non-matches
Purity of oracle classification: 0.929
Entropy of oracle classification: 0.371
Number of true matches: 52
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(10)392_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976923
recall 0.424749
f-measure 0.592075
da 130
dm 0
ndm 0
tp 127
fp 3
tn 4.76529e+07
fn 172
Name: (10, 1 - acm diverg, 392), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)392_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 540
Number of entity ID pairs that occurred more than once: 0
Analyse set of 540 weight vectors
Containing 130 true matches and 410 true non-matches
(24.07% true matches)
Identified 509 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 481 (94.50%)
2 : 25 (4.91%)
3 : 3 (0.59%)
Identified 0 non-pure unique weight vectors (from 509 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 119
0.000 : 390
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 540
Number of unique weight vectors: 509
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(509, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 509 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 509 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 26 matches and 55 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.905
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 428 weight vectors
Based on 26 matches and 55 non-matches
Classified 116 matches and 312 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(116, 0.6790123456790124, 0.9054522631867894, 0.32098765432098764)
(312, 0.6790123456790124, 0.9054522631867894, 0.32098765432098764)
Current size of match and non-match training data sets: 26 / 55
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 312 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 66
Farthest first selection of 66 weight vectors from 312 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.767, 0.545, 0.818, 0.714, 0.773] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.423, 0.478, 0.357, 0.615, 0.727] (False)
[1.000, 0.000, 0.750, 0.533, 0.294, 0.333, 0.429] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[0.800, 0.000, 0.625, 0.571, 0.467, 0.474, 0.667] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.333, 0.214, 0.333, 0.588, 0.476] (False)
[1.000, 0.000, 0.583, 0.786, 0.842, 0.800, 0.833] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.778, 0.500, 0.789, 0.750, 0.385] (False)
[1.000, 0.000, 0.333, 0.600, 0.800, 0.778, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.741, 0.556, 0.667, 0.350, 0.556] (False)
[1.000, 0.000, 0.833, 0.833, 0.550, 0.500, 0.688] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.875, 0.467, 0.471, 0.833, 0.571] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.857, 0.000, 0.500, 0.389, 0.235, 0.045, 0.526] (False)
[1.000, 0.000, 0.429, 0.571, 0.333, 0.444, 0.400] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 0.000, 0.600, 0.857, 0.579, 0.286, 0.545] (False)
[1.000, 0.000, 0.500, 0.375, 0.417, 0.259, 0.250] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.367, 0.429, 0.571, 0.306, 0.762] (False)
[1.000, 0.000, 0.000, 0.700, 0.818, 0.444, 0.619] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.407, 0.643, 0.667, 0.500, 0.563] (False)
[1.000, 0.000, 0.767, 0.667, 0.545, 0.786, 0.773] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 66 weight vectors
The oracle will correctly classify 66 weight vectors and wrongly classify 0
Classified 0 matches and 66 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 66 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
130.0
Analisando o arquivo: diverg(10)97_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (10, 1 - acm diverg, 97), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)97_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 953
Number of entity ID pairs that occurred more than once: 0
Analyse set of 953 weight vectors
Containing 216 true matches and 737 true non-matches
(22.67% true matches)
Identified 898 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 862 (95.99%)
2 : 33 (3.67%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 898 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 716
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 952
Number of unique weight vectors: 898
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(898, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 898 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 898 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 812 weight vectors
Based on 24 matches and 62 non-matches
Classified 112 matches and 700 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(700, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 112 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 46
Farthest first selection of 46 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 46 weight vectors
The oracle will correctly classify 46 weight vectors and wrongly classify 0
Classified 44 matches and 2 non-matches
Purity of oracle classification: 0.957
Entropy of oracle classification: 0.258
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 46 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)611_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 611), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)611_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)68_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 68), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)68_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 123 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)849_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (10, 1 - acm diverg, 849), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)849_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 617
Number of entity ID pairs that occurred more than once: 0
Analyse set of 617 weight vectors
Containing 200 true matches and 417 true non-matches
(32.41% true matches)
Identified 568 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 534 (94.01%)
2 : 31 (5.46%)
3 : 2 (0.35%)
15 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 568 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 171
0.933 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 396
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 616
Number of unique weight vectors: 568
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(568, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 568 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 568 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 26 matches and 56 non-matches
Purity of oracle classification: 0.683
Entropy of oracle classification: 0.901
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 486 weight vectors
Based on 26 matches and 56 non-matches
Classified 141 matches and 345 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(141, 0.6829268292682927, 0.9011701959974223, 0.3170731707317073)
(345, 0.6829268292682927, 0.9011701959974223, 0.3170731707317073)
Current size of match and non-match training data sets: 26 / 56
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 141 weight vectors
- Estimated match proportion 0.317
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 141 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(15)215_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 215), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)215_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 804
Number of entity ID pairs that occurred more than once: 0
Analyse set of 804 weight vectors
Containing 226 true matches and 578 true non-matches
(28.11% true matches)
Identified 765 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 746 (97.52%)
2 : 16 (2.09%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 765 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 575
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 803
Number of unique weight vectors: 765
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(765, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 765 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 765 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 30 matches and 55 non-matches
Purity of oracle classification: 0.647
Entropy of oracle classification: 0.937
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 680 weight vectors
Based on 30 matches and 55 non-matches
Classified 153 matches and 527 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
(527, 0.6470588235294118, 0.9366673818775626, 0.35294117647058826)
Current size of match and non-match training data sets: 30 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.94
- Size 527 weight vectors
- Estimated match proportion 0.353
Sample size for this cluster: 75
Farthest first selection of 75 weight vectors from 527 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 4 matches and 71 non-matches
Purity of oracle classification: 0.947
Entropy of oracle classification: 0.300
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 71
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)967_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.976378
recall 0.414716
f-measure 0.58216
da 127
dm 0
ndm 0
tp 124
fp 3
tn 4.76529e+07
fn 175
Name: (10, 1 - acm diverg, 967), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)967_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 947
Number of entity ID pairs that occurred more than once: 0
Analyse set of 947 weight vectors
Containing 141 true matches and 806 true non-matches
(14.89% true matches)
Identified 913 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 884 (96.82%)
2 : 26 (2.85%)
3 : 2 (0.22%)
5 : 1 (0.11%)
Identified 0 non-pure unique weight vectors (from 913 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 127
0.000 : 786
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 947
Number of unique weight vectors: 913
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(913, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 913 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 913 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 29 matches and 58 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 826 weight vectors
Based on 29 matches and 58 non-matches
Classified 246 matches and 580 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(246, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(580, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 29 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 246 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 63
Farthest first selection of 63 weight vectors from 246 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 63 weight vectors
The oracle will correctly classify 63 weight vectors and wrongly classify 0
Classified 37 matches and 26 non-matches
Purity of oracle classification: 0.587
Entropy of oracle classification: 0.978
Number of true matches: 37
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 63 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
127.0
Analisando o arquivo: diverg(20)294_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 294), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)294_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 112 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 45
Farthest first selection of 45 weight vectors from 112 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 45 weight vectors
The oracle will correctly classify 45 weight vectors and wrongly classify 0
Classified 44 matches and 1 non-matches
Purity of oracle classification: 0.978
Entropy of oracle classification: 0.154
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 45 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)215_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 215), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)215_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 808
Number of entity ID pairs that occurred more than once: 0
Analyse set of 808 weight vectors
Containing 226 true matches and 582 true non-matches
(27.97% true matches)
Identified 769 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (97.53%)
2 : 16 (2.08%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 769 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 579
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 807
Number of unique weight vectors: 769
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(769, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 769 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 769 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.733, 0.000, 0.176, 0.348, 0.351, 0.217, 0.188] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 29 matches and 56 non-matches
Purity of oracle classification: 0.659
Entropy of oracle classification: 0.926
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 684 weight vectors
Based on 29 matches and 56 non-matches
Classified 146 matches and 538 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
(538, 0.6588235294117647, 0.9259400597385791, 0.3411764705882353)
Current size of match and non-match training data sets: 29 / 56
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 146 weight vectors
- Estimated match proportion 0.341
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 146 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 50 matches and 4 non-matches
Purity of oracle classification: 0.926
Entropy of oracle classification: 0.381
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)449_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 449), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)449_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1038
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1038 weight vectors
Containing 207 true matches and 831 true non-matches
(19.94% true matches)
Identified 991 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 956 (96.47%)
2 : 32 (3.23%)
3 : 2 (0.20%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 991 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 810
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1037
Number of unique weight vectors: 991
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(991, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 991 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 991 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 31 matches and 56 non-matches
Purity of oracle classification: 0.644
Entropy of oracle classification: 0.940
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 904 weight vectors
Based on 31 matches and 56 non-matches
Classified 307 matches and 597 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(307, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
(597, 0.6436781609195402, 0.9395876193289701, 0.3563218390804598)
Current size of match and non-match training data sets: 31 / 56
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 597 weight vectors
- Estimated match proportion 0.356
Sample size for this cluster: 77
Farthest first selection of 77 weight vectors from 597 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.417, 0.696, 0.824, 0.455, 0.842] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.667, 0.000, 0.800, 0.633, 0.647, 0.500, 0.600] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.583, 0.500, 0.778, 0.647, 0.643] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.667, 0.538, 0.455, 0.581, 0.385] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 0 matches and 77 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 77
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(10)758_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (10, 1 - acm diverg, 758), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)758_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 812
Number of entity ID pairs that occurred more than once: 0
Analyse set of 812 weight vectors
Containing 209 true matches and 603 true non-matches
(25.74% true matches)
Identified 765 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 730 (95.42%)
2 : 32 (4.18%)
3 : 2 (0.26%)
12 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 765 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 582
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 811
Number of unique weight vectors: 765
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(765, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 765 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 765 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 680 weight vectors
Based on 26 matches and 59 non-matches
Classified 126 matches and 554 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(126, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(554, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 554 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 554 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 16 matches and 55 non-matches
Purity of oracle classification: 0.775
Entropy of oracle classification: 0.770
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)693_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.98
recall 0.327759
f-measure 0.491228
da 100
dm 0
ndm 0
tp 98
fp 2
tn 4.76529e+07
fn 201
Name: (10, 1 - acm diverg, 693), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)693_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 603
Number of entity ID pairs that occurred more than once: 0
Analyse set of 603 weight vectors
Containing 156 true matches and 447 true non-matches
(25.87% true matches)
Identified 569 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 539 (94.73%)
2 : 27 (4.75%)
3 : 2 (0.35%)
4 : 1 (0.18%)
Identified 0 non-pure unique weight vectors (from 569 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 142
0.000 : 427
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 603
Number of unique weight vectors: 569
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(569, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 569 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 569 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 27 matches and 55 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 487 weight vectors
Based on 27 matches and 55 non-matches
Classified 144 matches and 343 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
(343, 0.6707317073170732, 0.9141770436147918, 0.32926829268292684)
Current size of match and non-match training data sets: 27 / 55
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 144 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 144 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.143, 0.143, 0.143, 0.133, 0.267] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.875, 1.000, 0.182, 0.267, 0.237, 0.206, 0.167] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 39 matches and 15 non-matches
Purity of oracle classification: 0.722
Entropy of oracle classification: 0.852
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 15
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(15)428_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 428), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)428_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 695
Number of entity ID pairs that occurred more than once: 0
Analyse set of 695 weight vectors
Containing 194 true matches and 501 true non-matches
(27.91% true matches)
Identified 671 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 654 (97.47%)
2 : 14 (2.09%)
3 : 2 (0.30%)
7 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 671 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.000 : 499
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 695
Number of unique weight vectors: 671
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(671, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 671 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 671 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 34 matches and 50 non-matches
Purity of oracle classification: 0.595
Entropy of oracle classification: 0.974
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 587 weight vectors
Based on 34 matches and 50 non-matches
Classified 276 matches and 311 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(276, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
(311, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
Current size of match and non-match training data sets: 34 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 276 weight vectors
- Estimated match proportion 0.405
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 276 vectors
The selected farthest weight vectors are:
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 44 matches and 25 non-matches
Purity of oracle classification: 0.638
Entropy of oracle classification: 0.945
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)172_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 172), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)172_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 765
Number of entity ID pairs that occurred more than once: 0
Analyse set of 765 weight vectors
Containing 198 true matches and 567 true non-matches
(25.88% true matches)
Identified 723 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 688 (95.16%)
2 : 32 (4.43%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 723 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 547
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 765
Number of unique weight vectors: 723
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(723, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 723 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 723 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 638 weight vectors
Based on 27 matches and 58 non-matches
Classified 144 matches and 494 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(494, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 144 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 144 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(15)42_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979381
recall 0.317726
f-measure 0.479798
da 97
dm 0
ndm 0
tp 95
fp 2
tn 4.76529e+07
fn 204
Name: (15, 1 - acm diverg, 42), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)42_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 997
Number of entity ID pairs that occurred more than once: 0
Analyse set of 997 weight vectors
Containing 170 true matches and 827 true non-matches
(17.05% true matches)
Identified 960 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 929 (96.77%)
2 : 28 (2.92%)
3 : 2 (0.21%)
6 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 960 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 153
0.000 : 807
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 997
Number of unique weight vectors: 960
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(960, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 960 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 960 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 25 matches and 62 non-matches
Purity of oracle classification: 0.713
Entropy of oracle classification: 0.865
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 873 weight vectors
Based on 25 matches and 62 non-matches
Classified 42 matches and 831 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(42, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
(831, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 25 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 42 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 28
Farthest first selection of 28 weight vectors from 42 vectors
The selected farthest weight vectors are:
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[0.971, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.833, 1.000, 1.000, 0.935] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.929, 1.000, 1.000, 1.000, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.857, 1.000, 0.941, 0.917] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
Perform oracle with 100.00 accuracy on 28 weight vectors
The oracle will correctly classify 28 weight vectors and wrongly classify 0
Classified 28 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 28 weight vectors (classified by oracle) from cluster
Cluster is pure enough and not too large, add its 42 weight vectors to:
Match training set
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 3: Queue length: 1
Number of manual oracle classifications performed: 115
Size, purity, entropy, and estimated match proportion of clusters in queue:
(831, 0.7126436781609196, 0.8652817028791377, 0.28735632183908044)
Current size of match and non-match training data sets: 67 / 62
Selected cluster with (queue ordering: random):
- Purity 0.71 and entropy 0.87
- Size 831 weight vectors
- Estimated match proportion 0.287
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 831 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.647, 0.429] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 16 matches and 56 non-matches
Purity of oracle classification: 0.778
Entropy of oracle classification: 0.764
Number of true matches: 16
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
97.0
Analisando o arquivo: diverg(10)756_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 756), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)756_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 814
Number of entity ID pairs that occurred more than once: 0
Analyse set of 814 weight vectors
Containing 220 true matches and 594 true non-matches
(27.03% true matches)
Identified 758 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 722 (95.25%)
2 : 33 (4.35%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 758 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 573
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 813
Number of unique weight vectors: 758
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(758, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 758 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 758 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 673 weight vectors
Based on 27 matches and 58 non-matches
Classified 146 matches and 527 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(527, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 146 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 146 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)370_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 370), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)370_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 559
Number of entity ID pairs that occurred more than once: 0
Analyse set of 559 weight vectors
Containing 187 true matches and 372 true non-matches
(33.45% true matches)
Identified 535 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 522 (97.57%)
2 : 10 (1.87%)
3 : 2 (0.37%)
11 : 1 (0.19%)
Identified 1 non-pure unique weight vectors (from 535 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 163
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 371
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 558
Number of unique weight vectors: 535
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(535, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 535 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 535 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.850, 0.733, 0.652, 0.778, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 28 matches and 53 non-matches
Purity of oracle classification: 0.654
Entropy of oracle classification: 0.930
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 454 weight vectors
Based on 28 matches and 53 non-matches
Classified 138 matches and 316 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(138, 0.654320987654321, 0.9301497323974337, 0.345679012345679)
(316, 0.654320987654321, 0.9301497323974337, 0.345679012345679)
Current size of match and non-match training data sets: 28 / 53
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 316 weight vectors
- Estimated match proportion 0.346
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 316 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.500, 0.826, 0.429, 0.538, 0.636] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.750, 0.857, 0.235, 0.636, 0.550] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.692, 0.692, 0.727, 0.710, 0.250] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.538, 0.677, 0.316, 0.714, 0.381] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.667, 0.000, 0.800, 0.684, 0.667, 0.529, 0.609] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.846, 0.542, 0.588, 0.579, 0.423] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 3 matches and 65 non-matches
Purity of oracle classification: 0.956
Entropy of oracle classification: 0.261
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)23_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981818
recall 0.180602
f-measure 0.305085
da 55
dm 0
ndm 0
tp 54
fp 1
tn 4.76529e+07
fn 245
Name: (15, 1 - acm diverg, 23), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)23_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 956
Number of entity ID pairs that occurred more than once: 0
Analyse set of 956 weight vectors
Containing 205 true matches and 751 true non-matches
(21.44% true matches)
Identified 905 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 871 (96.24%)
2 : 31 (3.43%)
3 : 2 (0.22%)
17 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 905 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 174
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 730
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 955
Number of unique weight vectors: 905
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(905, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 905 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 905 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 818 weight vectors
Based on 24 matches and 63 non-matches
Classified 112 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
55.0
Analisando o arquivo: diverg(20)652_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 652), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)652_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1094
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1094 weight vectors
Containing 221 true matches and 873 true non-matches
(20.20% true matches)
Identified 1038 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1002 (96.53%)
2 : 33 (3.18%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1038 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1093
Number of unique weight vectors: 1038
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1038, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1038 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1038 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 950 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 950 non-matches
46.0
Analisando o arquivo: diverg(10)748_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979592
recall 0.32107
f-measure 0.483627
da 98
dm 0
ndm 0
tp 96
fp 2
tn 4.76529e+07
fn 203
Name: (10, 1 - acm diverg, 748), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)748_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 415
Number of entity ID pairs that occurred more than once: 0
Analyse set of 415 weight vectors
Containing 167 true matches and 248 true non-matches
(40.24% true matches)
Identified 396 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 383 (96.72%)
2 : 10 (2.53%)
3 : 2 (0.51%)
6 : 1 (0.25%)
Identified 0 non-pure unique weight vectors (from 396 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.000 : 246
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 415
Number of unique weight vectors: 396
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(396, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 396 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 396 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 36 matches and 41 non-matches
Purity of oracle classification: 0.532
Entropy of oracle classification: 0.997
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 319 weight vectors
Based on 36 matches and 41 non-matches
Classified 254 matches and 65 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(254, 0.5324675324675324, 0.9969562518473083, 0.4675324675324675)
(65, 0.5324675324675324, 0.9969562518473083, 0.4675324675324675)
Current size of match and non-match training data sets: 36 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 65 weight vectors
- Estimated match proportion 0.468
Sample size for this cluster: 39
Farthest first selection of 39 weight vectors from 65 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.667, 0.857, 0.588, 0.667, 0.385] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.367, 0.733, 0.417, 0.727, 0.474] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
Perform oracle with 100.00 accuracy on 39 weight vectors
The oracle will correctly classify 39 weight vectors and wrongly classify 0
Classified 0 matches and 39 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 39 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
98.0
Analisando o arquivo: diverg(10)177_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 177), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)177_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 506
Number of entity ID pairs that occurred more than once: 0
Analyse set of 506 weight vectors
Containing 187 true matches and 319 true non-matches
(36.96% true matches)
Identified 482 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 469 (97.30%)
2 : 10 (2.07%)
3 : 2 (0.41%)
11 : 1 (0.21%)
Identified 1 non-pure unique weight vectors (from 482 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 316
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 505
Number of unique weight vectors: 482
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(482, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 482 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 482 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 34 matches and 46 non-matches
Purity of oracle classification: 0.575
Entropy of oracle classification: 0.984
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 46
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 402 weight vectors
Based on 34 matches and 46 non-matches
Classified 126 matches and 276 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(126, 0.575, 0.9837082626231857, 0.425)
(276, 0.575, 0.9837082626231857, 0.425)
Current size of match and non-match training data sets: 34 / 46
Selected cluster with (queue ordering: random):
- Purity 0.57 and entropy 0.98
- Size 126 weight vectors
- Estimated match proportion 0.425
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 126 vectors
The selected farthest weight vectors are:
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(10)388_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987805
recall 0.270903
f-measure 0.425197
da 82
dm 0
ndm 0
tp 81
fp 1
tn 4.76529e+07
fn 218
Name: (10, 1 - acm diverg, 388), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)388_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 584
Number of entity ID pairs that occurred more than once: 0
Analyse set of 584 weight vectors
Containing 168 true matches and 416 true non-matches
(28.77% true matches)
Identified 564 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 553 (98.05%)
2 : 8 (1.42%)
3 : 2 (0.35%)
9 : 1 (0.18%)
Identified 1 non-pure unique weight vectors (from 564 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 150
0.889 : 1 (all weight vectors with this pureness to be removed)
0.000 : 413
Removed 9 non-pure weight vectors
Final number of weight vectors to use: 575
Number of unique weight vectors: 563
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(563, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 563 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 82
Perform initial selection using "far" method
Farthest first selection of 82 weight vectors from 563 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.481, 0.429, 0.750, 0.350, 0.778] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.538, 0.789, 0.353, 0.545, 0.550] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.391, 0.538, 0.455, 0.548, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 82 weight vectors
The oracle will correctly classify 82 weight vectors and wrongly classify 0
Classified 31 matches and 51 non-matches
Purity of oracle classification: 0.622
Entropy of oracle classification: 0.957
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 51
Number of false non-matches: 0
Deleted 82 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 481 weight vectors
Based on 31 matches and 51 non-matches
Classified 125 matches and 356 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 82
Size, purity, entropy, and estimated match proportion of clusters in queue:
(125, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
(356, 0.6219512195121951, 0.956652272148091, 0.3780487804878049)
Current size of match and non-match training data sets: 31 / 51
Selected cluster with (queue ordering: random):
- Purity 0.62 and entropy 0.96
- Size 125 weight vectors
- Estimated match proportion 0.378
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 125 vectors
The selected farthest weight vectors are:
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.200, 0.200, 0.200, 0.200, 0.214] (False)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 46 matches and 7 non-matches
Purity of oracle classification: 0.868
Entropy of oracle classification: 0.563
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 7
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
82.0
Analisando o arquivo: diverg(20)996_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 996), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)996_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1027
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1027 weight vectors
Containing 223 true matches and 804 true non-matches
(21.71% true matches)
Identified 973 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 936 (96.20%)
2 : 34 (3.49%)
3 : 2 (0.21%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 973 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 783
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1026
Number of unique weight vectors: 973
Time to load and analyse the weight vector file: 0.05 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(973, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 973 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 973 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 886 weight vectors
Based on 26 matches and 61 non-matches
Classified 131 matches and 755 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(755, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 755 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 755 vectors
The selected farthest weight vectors are:
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.625, 0.182, 0.417, 0.185, 0.214] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.400, 0.500, 0.579, 0.643, 0.846] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.381, 0.833, 0.579, 0.778, 0.385] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 11 matches and 62 non-matches
Purity of oracle classification: 0.849
Entropy of oracle classification: 0.612
Number of true matches: 11
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)429_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 429), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)429_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 444
Number of entity ID pairs that occurred more than once: 0
Analyse set of 444 weight vectors
Containing 209 true matches and 235 true non-matches
(47.07% true matches)
Identified 410 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 393 (95.85%)
2 : 14 (3.41%)
3 : 2 (0.49%)
17 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 410 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 232
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 443
Number of unique weight vectors: 410
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(410, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 410 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 410 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.750, 1.000, 0.189, 0.324, 0.147, 0.200, 0.226] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.833, 0.550, 0.500, 0.688] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 39 matches and 39 non-matches
Purity of oracle classification: 0.500
Entropy of oracle classification: 1.000
Number of true matches: 39
Number of false matches: 0
Number of true non-matches: 39
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 332 weight vectors
Based on 39 matches and 39 non-matches
Classified 272 matches and 60 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(272, 0.5, 1.0, 0.5)
(60, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 39 / 39
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 272 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 272 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.913, 1.000, 0.184, 0.175, 0.087, 0.233, 0.167] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.750, 1.000, 0.333, 0.216, 0.139, 0.182, 0.179] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 43 matches and 28 non-matches
Purity of oracle classification: 0.606
Entropy of oracle classification: 0.968
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)211_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985294
recall 0.22408
f-measure 0.365123
da 68
dm 0
ndm 0
tp 67
fp 1
tn 4.76529e+07
fn 232
Name: (15, 1 - acm diverg, 211), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)211_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 765
Number of entity ID pairs that occurred more than once: 0
Analyse set of 765 weight vectors
Containing 198 true matches and 567 true non-matches
(25.88% true matches)
Identified 723 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 688 (95.16%)
2 : 32 (4.43%)
3 : 2 (0.28%)
7 : 1 (0.14%)
Identified 0 non-pure unique weight vectors (from 723 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.000 : 547
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 765
Number of unique weight vectors: 723
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(723, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 723 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 723 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 638 weight vectors
Based on 27 matches and 58 non-matches
Classified 144 matches and 494 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(144, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(494, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 494 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 494 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.758, 1.000, 0.250, 0.056, 0.034, 0.154, 0.103] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.767, 0.600, 0.857, 0.636, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.833, 0.826, 0.733, 0.455, 0.588] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.767, 0.667, 0.545, 0.786, 0.773] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.917, 0.786, 0.263, 0.500, 0.556] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.688, 0.545, 0.538] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 3 matches and 68 non-matches
Purity of oracle classification: 0.958
Entropy of oracle classification: 0.253
Number of true matches: 3
Number of false matches: 0
Number of true non-matches: 68
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
68.0
Analisando o arquivo: diverg(20)728_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 728), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)728_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1075
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1075 weight vectors
Containing 208 true matches and 867 true non-matches
(19.35% true matches)
Identified 1028 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 993 (96.60%)
2 : 32 (3.11%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1028 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1074
Number of unique weight vectors: 1028
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1028, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1028 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1028 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 940 weight vectors
Based on 25 matches and 63 non-matches
Classified 123 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 123 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 123 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 47 matches and 1 non-matches
Purity of oracle classification: 0.979
Entropy of oracle classification: 0.146
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)568_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985915
recall 0.234114
f-measure 0.378378
da 71
dm 0
ndm 0
tp 70
fp 1
tn 4.76529e+07
fn 229
Name: (10, 1 - acm diverg, 568), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)568_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 872
Number of entity ID pairs that occurred more than once: 0
Analyse set of 872 weight vectors
Containing 186 true matches and 686 true non-matches
(21.33% true matches)
Identified 832 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 798 (95.91%)
2 : 31 (3.73%)
3 : 2 (0.24%)
6 : 1 (0.12%)
Identified 0 non-pure unique weight vectors (from 832 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 166
0.000 : 666
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 872
Number of unique weight vectors: 832
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(832, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 832 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 832 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.500, 0.286, 0.333, 0.222, 0.179] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 746 weight vectors
Based on 28 matches and 58 non-matches
Classified 148 matches and 598 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(598, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 598 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 598 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.667, 0.389, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 0.700, 0.545, 0.526, 0.818, 0.722] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 0.000, 0.909, 0.700, 0.500, 0.306, 0.824] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 0 matches and 74 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 74
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
71.0
Analisando o arquivo: diverg(15)905_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 905), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)905_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 925
Number of entity ID pairs that occurred more than once: 0
Analyse set of 925 weight vectors
Containing 217 true matches and 708 true non-matches
(23.46% true matches)
Identified 870 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 834 (95.86%)
2 : 33 (3.79%)
3 : 2 (0.23%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 870 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 687
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 924
Number of unique weight vectors: 870
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(870, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 870 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 870 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 784 weight vectors
Based on 28 matches and 58 non-matches
Classified 165 matches and 619 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(165, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(619, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 619 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 619 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.929, 1.000, 0.212, 0.228, 0.250, 0.284, 0.262] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[1.000, 0.000, 0.800, 0.696, 0.882, 0.727, 0.708] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.316, 0.583, 0.435, 0.833, 0.692] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.538, 0.333, 0.611, 0.818, 0.654] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.731, 0.652, 0.583, 0.241, 0.229] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 1 matches and 73 non-matches
Purity of oracle classification: 0.986
Entropy of oracle classification: 0.103
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 73
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)513_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 513), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)513_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 854
Number of entity ID pairs that occurred more than once: 0
Analyse set of 854 weight vectors
Containing 221 true matches and 633 true non-matches
(25.88% true matches)
Identified 798 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 762 (95.49%)
2 : 33 (4.14%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 798 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 612
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 853
Number of unique weight vectors: 798
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(798, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 798 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 798 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 713 weight vectors
Based on 28 matches and 57 non-matches
Classified 150 matches and 563 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(563, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 563 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 563 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 5 matches and 69 non-matches
Purity of oracle classification: 0.932
Entropy of oracle classification: 0.357
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(10)847_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 847), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)847_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 780
Number of entity ID pairs that occurred more than once: 0
Analyse set of 780 weight vectors
Containing 205 true matches and 575 true non-matches
(26.28% true matches)
Identified 733 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 698 (95.23%)
2 : 32 (4.37%)
3 : 2 (0.27%)
12 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 733 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 554
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 779
Number of unique weight vectors: 733
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(733, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 733 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 733 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 31 matches and 54 non-matches
Purity of oracle classification: 0.635
Entropy of oracle classification: 0.947
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 648 weight vectors
Based on 31 matches and 54 non-matches
Classified 321 matches and 327 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(321, 0.6352941176470588, 0.9465202215633438, 0.36470588235294116)
(327, 0.6352941176470588, 0.9465202215633438, 0.36470588235294116)
Current size of match and non-match training data sets: 31 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.95
- Size 327 weight vectors
- Estimated match proportion 0.365
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 327 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.333, 0.917, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.000, 0.600, 0.818, 0.571, 0.524] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.556, 0.348, 0.467, 0.636, 0.412] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.269, 0.478, 0.750, 0.385, 0.455] (False)
[1.000, 0.000, 0.042, 0.500, 0.550, 0.875, 0.714] (False)
[1.000, 0.000, 0.538, 0.600, 0.471, 0.632, 0.688] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.000, 0.714, 0.545, 0.471, 0.476] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.333, 0.455, 0.688, 0.714, 1.000] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.583, 0.786, 0.842, 0.800, 0.833] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.800, 0.667, 0.381, 0.550, 0.429] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.522, 0.929, 0.526, 0.235, 0.286] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[1.000, 0.000, 0.875, 0.778, 0.471, 0.706, 0.714] (False)
[1.000, 0.000, 0.300, 0.500, 0.810, 0.750, 0.238] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.667, 0.286, 0.556, 0.259, 0.250] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[1.000, 0.000, 0.833, 0.500, 0.368, 0.235, 0.429] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.263, 0.333, 0.708, 0.600, 0.650] (False)
[1.000, 0.000, 0.524, 0.357, 0.833, 0.194, 0.313] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 0.000, 0.375, 0.833, 0.800, 0.583, 0.313] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 0 matches and 70 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 70
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(15)885_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (15, 1 - acm diverg, 885), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)885_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 432
Number of entity ID pairs that occurred more than once: 0
Analyse set of 432 weight vectors
Containing 184 true matches and 248 true non-matches
(42.59% true matches)
Identified 411 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 401 (97.57%)
2 : 7 (1.70%)
3 : 2 (0.49%)
11 : 1 (0.24%)
Identified 1 non-pure unique weight vectors (from 411 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 163
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 247
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 431
Number of unique weight vectors: 411
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(411, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 411 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 78
Perform initial selection using "far" method
Farthest first selection of 78 weight vectors from 411 vectors
The selected farthest weight vectors are:
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 78 weight vectors
The oracle will correctly classify 78 weight vectors and wrongly classify 0
Classified 33 matches and 45 non-matches
Purity of oracle classification: 0.577
Entropy of oracle classification: 0.983
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 45
Number of false non-matches: 0
Deleted 78 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 333 weight vectors
Based on 33 matches and 45 non-matches
Classified 124 matches and 209 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 78
Size, purity, entropy, and estimated match proportion of clusters in queue:
(124, 0.5769230769230769, 0.9828586897127056, 0.4230769230769231)
(209, 0.5769230769230769, 0.9828586897127056, 0.4230769230769231)
Current size of match and non-match training data sets: 33 / 45
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 209 weight vectors
- Estimated match proportion 0.423
Sample size for this cluster: 65
Farthest first selection of 65 weight vectors from 209 vectors
The selected farthest weight vectors are:
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.911, 1.000, 0.097, 0.025, 0.075, 0.288, 0.486] (False)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.625, 1.000, 0.217, 0.160, 0.151, 0.094, 0.203] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.804, 1.000, 0.091, 0.175, 0.074, 0.069, 0.111] (False)
[1.000, 0.000, 0.625, 0.571, 0.412, 0.474, 0.556] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.367, 1.000, 0.160, 0.170, 0.077, 0.200, 0.178] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.650, 1.000, 0.086, 0.219, 0.143, 0.108, 1.000] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.483, 0.818, 0.773, 0.478, 0.727] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 0.000, 0.636, 0.800, 0.471, 0.333, 0.625] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.737, 1.000, 0.211, 0.071, 0.233, 0.111, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[0.867, 1.000, 0.208, 0.167, 0.194, 0.341, 0.151] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 9 matches and 56 non-matches
Purity of oracle classification: 0.862
Entropy of oracle classification: 0.580
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(15)682_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 682), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)682_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1055
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1055 weight vectors
Containing 214 true matches and 841 true non-matches
(20.28% true matches)
Identified 998 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 961 (96.29%)
2 : 34 (3.41%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 998 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 820
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1054
Number of unique weight vectors: 998
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(998, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 998 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 998 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.524, 0.455, 0.417, 0.875, 0.556] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 911 weight vectors
Based on 27 matches and 60 non-matches
Classified 161 matches and 750 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(161, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(750, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 161 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 161 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 46 matches and 9 non-matches
Purity of oracle classification: 0.836
Entropy of oracle classification: 0.643
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 9
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)883_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 883), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)883_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 537
Number of entity ID pairs that occurred more than once: 0
Analyse set of 537 weight vectors
Containing 224 true matches and 313 true non-matches
(41.71% true matches)
Identified 498 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 479 (96.18%)
2 : 16 (3.21%)
3 : 2 (0.40%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 498 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 310
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 536
Number of unique weight vectors: 498
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(498, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 498 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 498 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.545, 0.786, 0.500, 0.444, 0.692] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 33 matches and 47 non-matches
Purity of oracle classification: 0.588
Entropy of oracle classification: 0.978
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 47
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 418 weight vectors
Based on 33 matches and 47 non-matches
Classified 151 matches and 267 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(151, 0.5875, 0.9777945702913884, 0.4125)
(267, 0.5875, 0.9777945702913884, 0.4125)
Current size of match and non-match training data sets: 33 / 47
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 267 weight vectors
- Estimated match proportion 0.412
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 267 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.818, 0.636, 0.750, 0.563, 0.750] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 6 matches and 63 non-matches
Purity of oracle classification: 0.913
Entropy of oracle classification: 0.426
Number of true matches: 6
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)230_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 230), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)230_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1058
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1058 weight vectors
Containing 209 true matches and 849 true non-matches
(19.75% true matches)
Identified 1011 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 976 (96.54%)
2 : 32 (3.17%)
3 : 2 (0.20%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1011 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1057
Number of unique weight vectors: 1011
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1011, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1011 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1011 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 924 weight vectors
Based on 23 matches and 64 non-matches
Classified 104 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(104, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 104 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 104 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)297_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 297), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)297_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)211_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 211), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)211_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 855
Number of entity ID pairs that occurred more than once: 0
Analyse set of 855 weight vectors
Containing 221 true matches and 634 true non-matches
(25.85% true matches)
Identified 799 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 763 (95.49%)
2 : 33 (4.13%)
3 : 2 (0.25%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 799 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 613
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 854
Number of unique weight vectors: 799
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(799, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 799 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 799 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 714 weight vectors
Based on 28 matches and 57 non-matches
Classified 150 matches and 564 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(564, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 564 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 74
Farthest first selection of 74 weight vectors from 564 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.544, 1.000, 0.091, 0.226, 0.255, 0.238, 0.000] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 5 matches and 69 non-matches
Purity of oracle classification: 0.932
Entropy of oracle classification: 0.357
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 69
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(20)360_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 360), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)360_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 131 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)161_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 161), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)161_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 731
Number of entity ID pairs that occurred more than once: 0
Analyse set of 731 weight vectors
Containing 210 true matches and 521 true non-matches
(28.73% true matches)
Identified 698 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 684 (97.99%)
2 : 11 (1.58%)
3 : 2 (0.29%)
19 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 698 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 520
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 730
Number of unique weight vectors: 698
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(698, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 698 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 698 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 29 matches and 55 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.930
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 614 weight vectors
Based on 29 matches and 55 non-matches
Classified 122 matches and 492 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(122, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
(492, 0.6547619047619048, 0.9297432191769048, 0.34523809523809523)
Current size of match and non-match training data sets: 29 / 55
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 122 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 122 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 0.867, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 50 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.139
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)87_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (10, 1 - acm diverg, 87), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)87_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 361
Number of entity ID pairs that occurred more than once: 0
Analyse set of 361 weight vectors
Containing 203 true matches and 158 true non-matches
(56.23% true matches)
Identified 330 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 316 (95.76%)
2 : 11 (3.33%)
3 : 2 (0.61%)
17 : 1 (0.30%)
Identified 1 non-pure unique weight vectors (from 330 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 172
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 157
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 360
Number of unique weight vectors: 330
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(330, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 330 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 74
Perform initial selection using "far" method
Farthest first selection of 74 weight vectors from 330 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 74 weight vectors
The oracle will correctly classify 74 weight vectors and wrongly classify 0
Classified 50 matches and 24 non-matches
Purity of oracle classification: 0.676
Entropy of oracle classification: 0.909
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 24
Number of false non-matches: 0
Deleted 74 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 256 weight vectors
Based on 50 matches and 24 non-matches
Classified 256 matches and 0 non-matches
53.0
Analisando o arquivo: diverg(10)478_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (10, 1 - acm diverg, 478), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)478_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 380
Number of entity ID pairs that occurred more than once: 0
Analyse set of 380 weight vectors
Containing 216 true matches and 164 true non-matches
(56.84% true matches)
Identified 347 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 331 (95.39%)
2 : 13 (3.75%)
3 : 2 (0.58%)
17 : 1 (0.29%)
Identified 1 non-pure unique weight vectors (from 347 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 183
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 163
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 379
Number of unique weight vectors: 347
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(347, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 347 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 75
Perform initial selection using "far" method
Farthest first selection of 75 weight vectors from 347 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.667, 0.750, 0.417, 0.444, 0.750] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.747, 1.000, 0.222, 0.314, 0.212, 0.108, 0.277] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.857, 0.944, 0.214, 0.118, 0.111, 0.125, 0.000] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 75 weight vectors
The oracle will correctly classify 75 weight vectors and wrongly classify 0
Classified 46 matches and 29 non-matches
Purity of oracle classification: 0.613
Entropy of oracle classification: 0.963
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 29
Number of false non-matches: 0
Deleted 75 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 272 weight vectors
Based on 46 matches and 29 non-matches
Classified 272 matches and 0 non-matches
42.0
Analisando o arquivo: diverg(20)442_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.979167
recall 0.157191
f-measure 0.270893
da 48
dm 0
ndm 0
tp 47
fp 1
tn 4.76529e+07
fn 252
Name: (20, 1 - acm diverg, 442), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)442_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 724
Number of entity ID pairs that occurred more than once: 0
Analyse set of 724 weight vectors
Containing 212 true matches and 512 true non-matches
(29.28% true matches)
Identified 671 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 636 (94.78%)
2 : 32 (4.77%)
3 : 2 (0.30%)
18 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 671 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.944 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 491
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 723
Number of unique weight vectors: 671
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(671, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 671 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 671 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 587 weight vectors
Based on 27 matches and 57 non-matches
Classified 142 matches and 445 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(445, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 445 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 445 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 8 matches and 62 non-matches
Purity of oracle classification: 0.886
Entropy of oracle classification: 0.513
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
48.0
Analisando o arquivo: diverg(20)229_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 229), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)229_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.667, 0.722, 0.471, 0.545, 0.579] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.556, 0.182, 0.500, 0.071, 0.400] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 1.000, 1.000, 0.963, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[0.344, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.875, 0.267, 0.294, 0.296, 0.250] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.033, 0.300, 0.524, 0.727, 0.762] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 24 matches and 62 non-matches
Purity of oracle classification: 0.721
Entropy of oracle classification: 0.854
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 62
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 24 matches and 62 non-matches
Classified 13 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(13, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
(706, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 24 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 13 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 12
Farthest first selection of 12 weight vectors from 13 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.958, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.971, 0.952, 1.000] (True)
[1.000, 1.000, 1.000, 0.952, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.980, 1.000] (True)
[0.971, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.933, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
Perform oracle with 100.00 accuracy on 12 weight vectors
The oracle will correctly classify 12 weight vectors and wrongly classify 0
Classified 12 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 12
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 12 weight vectors (classified by oracle) from cluster
Cluster is pure enough and not too large, add its 13 weight vectors to:
Match training set
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 3: Queue length: 1
Number of manual oracle classifications performed: 98
Size, purity, entropy, and estimated match proportion of clusters in queue:
(706, 0.7209302325581395, 0.8541802051521675, 0.27906976744186046)
Current size of match and non-match training data sets: 37 / 62
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.279
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.462, 0.409, 0.833, 0.263, 0.688] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.462, 0.889, 0.455, 0.211, 0.375] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.440, 0.786, 0.545, 0.389, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 0.000, 0.769, 0.500, 0.529, 0.818, 0.789] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.583, 0.444, 0.412, 0.318, 0.421] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.067, 0.650, 0.579, 0.500, 0.286] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.692, 0.292, 0.500, 0.818, 0.308] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 22 matches and 48 non-matches
Purity of oracle classification: 0.686
Entropy of oracle classification: 0.898
Number of true matches: 22
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)636_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 636), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)636_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 101 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 43
Farthest first selection of 43 weight vectors from 101 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 43 weight vectors
The oracle will correctly classify 43 weight vectors and wrongly classify 0
Classified 42 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.159
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 43 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)558_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 558), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)558_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)601_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 601), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)601_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)335_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 335), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)335_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 839
Number of entity ID pairs that occurred more than once: 0
Analyse set of 839 weight vectors
Containing 213 true matches and 626 true non-matches
(25.39% true matches)
Identified 785 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 750 (95.54%)
2 : 32 (4.08%)
3 : 2 (0.25%)
19 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 785 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 179
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 605
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 838
Number of unique weight vectors: 785
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(785, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 785 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 785 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 700 weight vectors
Based on 27 matches and 58 non-matches
Classified 137 matches and 563 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(137, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(563, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 137 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 137 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(20)426_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 426), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)426_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)231_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (20, 1 - acm diverg, 231), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)231_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1069
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1069 weight vectors
Containing 221 true matches and 848 true non-matches
(20.67% true matches)
Identified 1013 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 977 (96.45%)
2 : 33 (3.26%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1013 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 185
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 827
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1068
Number of unique weight vectors: 1013
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1013, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1013 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1013 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.300, 0.786, 0.818, 0.778, 0.846] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 27 matches and 60 non-matches
Purity of oracle classification: 0.690
Entropy of oracle classification: 0.894
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 926 weight vectors
Based on 27 matches and 60 non-matches
Classified 142 matches and 784 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(142, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
(784, 0.6896551724137931, 0.8935711016541907, 0.3103448275862069)
Current size of match and non-match training data sets: 27 / 60
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 142 weight vectors
- Estimated match proportion 0.310
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 142 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 50 matches and 2 non-matches
Purity of oracle classification: 0.962
Entropy of oracle classification: 0.235
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)322_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 322), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)322_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 714
Number of entity ID pairs that occurred more than once: 0
Analyse set of 714 weight vectors
Containing 220 true matches and 494 true non-matches
(30.81% true matches)
Identified 678 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 662 (97.64%)
2 : 13 (1.92%)
3 : 2 (0.29%)
20 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 678 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 493
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 713
Number of unique weight vectors: 678
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(678, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 678 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 678 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 31 matches and 53 non-matches
Purity of oracle classification: 0.631
Entropy of oracle classification: 0.950
Number of true matches: 31
Number of false matches: 0
Number of true non-matches: 53
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 594 weight vectors
Based on 31 matches and 53 non-matches
Classified 320 matches and 274 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(320, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
(274, 0.6309523809523809, 0.9499380214234903, 0.36904761904761907)
Current size of match and non-match training data sets: 31 / 53
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 274 weight vectors
- Estimated match proportion 0.369
Sample size for this cluster: 67
Farthest first selection of 67 weight vectors from 274 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.731, 0.792, 0.609, 0.867, 0.636] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.895, 0.625, 0.750, 0.278, 0.188] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.462, 0.667, 0.600, 0.389, 0.615] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.500, 0.600, 0.722, 0.643] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.231, 0.609, 0.643, 0.722, 0.846] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 0.000, 0.375, 0.619, 0.400, 0.778, 0.714] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 0.000, 0.786, 0.591, 0.273, 0.522, 0.450] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 0.818, 0.667, 0.458, 0.333, 0.229] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.571, 0.867, 0.471, 0.583, 0.643] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 67 weight vectors
The oracle will correctly classify 67 weight vectors and wrongly classify 0
Classified 0 matches and 67 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 67 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)985_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984375
recall 0.210702
f-measure 0.347107
da 64
dm 0
ndm 0
tp 63
fp 1
tn 4.76529e+07
fn 236
Name: (10, 1 - acm diverg, 985), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)985_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 998
Number of entity ID pairs that occurred more than once: 0
Analyse set of 998 weight vectors
Containing 199 true matches and 799 true non-matches
(19.94% true matches)
Identified 948 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 914 (96.41%)
2 : 31 (3.27%)
3 : 2 (0.21%)
16 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 948 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 778
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 997
Number of unique weight vectors: 948
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(948, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 948 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 948 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 32 matches and 55 non-matches
Purity of oracle classification: 0.632
Entropy of oracle classification: 0.949
Number of true matches: 32
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 861 weight vectors
Based on 32 matches and 55 non-matches
Classified 282 matches and 579 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(282, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
(579, 0.632183908045977, 0.9489804585630242, 0.367816091954023)
Current size of match and non-match training data sets: 32 / 55
Selected cluster with (queue ordering: random):
- Purity 0.63 and entropy 0.95
- Size 282 weight vectors
- Estimated match proportion 0.368
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 282 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 46 matches and 22 non-matches
Purity of oracle classification: 0.676
Entropy of oracle classification: 0.908
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 22
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
64.0
Analisando o arquivo: diverg(10)355_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.99
recall 0.331104
f-measure 0.496241
da 100
dm 0
ndm 0
tp 99
fp 1
tn 4.76529e+07
fn 200
Name: (10, 1 - acm diverg, 355), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)355_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 675
Number of entity ID pairs that occurred more than once: 0
Analyse set of 675 weight vectors
Containing 161 true matches and 514 true non-matches
(23.85% true matches)
Identified 657 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 649 (98.78%)
2 : 5 (0.76%)
3 : 2 (0.30%)
10 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 657 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 143
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 513
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 674
Number of unique weight vectors: 657
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(657, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 657 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 657 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.769, 0.850, 0.353, 0.500, 0.750] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.909, 0.786, 0.583, 0.444, 0.375] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.700, 0.800, 0.833, 0.647, 0.857] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 27 matches and 57 non-matches
Purity of oracle classification: 0.679
Entropy of oracle classification: 0.906
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 573 weight vectors
Based on 27 matches and 57 non-matches
Classified 80 matches and 493 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(80, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
(493, 0.6785714285714286, 0.9059282160429992, 0.32142857142857145)
Current size of match and non-match training data sets: 27 / 57
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 493 weight vectors
- Estimated match proportion 0.321
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 493 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.889, 0.875, 0.375, 0.667, 0.533] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.462, 0.609, 0.643, 0.706, 0.786] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.614, 1.000, 0.208, 0.170, 0.216, 0.273, 0.333] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.300, 0.684, 0.833, 0.556, 0.433] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.704, 0.600, 0.333, 0.370, 0.188] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.267, 0.733, 0.471, 0.833, 0.526] (False)
[1.000, 0.000, 0.579, 0.867, 0.500, 0.574, 0.333] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 17 matches and 54 non-matches
Purity of oracle classification: 0.761
Entropy of oracle classification: 0.794
Number of true matches: 17
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
100.0
Analisando o arquivo: diverg(10)817_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 817), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)817_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 408
Number of entity ID pairs that occurred more than once: 0
Analyse set of 408 weight vectors
Containing 180 true matches and 228 true non-matches
(44.12% true matches)
Identified 387 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 377 (97.42%)
2 : 7 (1.81%)
3 : 2 (0.52%)
11 : 1 (0.26%)
Identified 1 non-pure unique weight vectors (from 387 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 227
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 407
Number of unique weight vectors: 387
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(387, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 387 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 77
Perform initial selection using "far" method
Farthest first selection of 77 weight vectors from 387 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.727, 0.733, 0.294, 0.667, 0.643] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 77 weight vectors
The oracle will correctly classify 77 weight vectors and wrongly classify 0
Classified 36 matches and 41 non-matches
Purity of oracle classification: 0.532
Entropy of oracle classification: 0.997
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 41
Number of false non-matches: 0
Deleted 77 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 310 weight vectors
Based on 36 matches and 41 non-matches
Classified 117 matches and 193 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 77
Size, purity, entropy, and estimated match proportion of clusters in queue:
(117, 0.5324675324675324, 0.9969562518473083, 0.4675324675324675)
(193, 0.5324675324675324, 0.9969562518473083, 0.4675324675324675)
Current size of match and non-match training data sets: 36 / 41
Selected cluster with (queue ordering: random):
- Purity 0.53 and entropy 1.00
- Size 117 weight vectors
- Estimated match proportion 0.468
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 117 vectors
The selected farthest weight vectors are:
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.900, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 47 matches and 6 non-matches
Purity of oracle classification: 0.887
Entropy of oracle classification: 0.510
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 6
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(10)986_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (10, 1 - acm diverg, 986), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)986_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 687
Number of entity ID pairs that occurred more than once: 0
Analyse set of 687 weight vectors
Containing 191 true matches and 496 true non-matches
(27.80% true matches)
Identified 663 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 646 (97.44%)
2 : 14 (2.11%)
3 : 2 (0.30%)
7 : 1 (0.15%)
Identified 0 non-pure unique weight vectors (from 663 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 169
0.000 : 494
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 687
Number of unique weight vectors: 663
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(663, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 663 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 663 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.435, 0.500, 0.500, 0.647, 0.476] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.229, 0.227, 0.125, 0.122, 0.160] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 34 matches and 50 non-matches
Purity of oracle classification: 0.595
Entropy of oracle classification: 0.974
Number of true matches: 34
Number of false matches: 0
Number of true non-matches: 50
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 579 weight vectors
Based on 34 matches and 50 non-matches
Classified 272 matches and 307 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(272, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
(307, 0.5952380952380952, 0.9736680645496201, 0.40476190476190477)
Current size of match and non-match training data sets: 34 / 50
Selected cluster with (queue ordering: random):
- Purity 0.60 and entropy 0.97
- Size 272 weight vectors
- Estimated match proportion 0.405
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 272 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.800, 1.000, 0.333, 0.267, 0.180, 0.132, 0.281] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.091, 0.148] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.500, 1.000, 0.244, 0.171, 0.150, 0.194, 0.250] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.207, 0.160, 0.185, 0.212, 0.121] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.750, 1.000, 0.257, 0.184, 0.286, 0.216, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 44 matches and 25 non-matches
Purity of oracle classification: 0.638
Entropy of oracle classification: 0.945
Number of true matches: 44
Number of false matches: 0
Number of true non-matches: 25
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(20)293_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (20, 1 - acm diverg, 293), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)293_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 953
Number of entity ID pairs that occurred more than once: 0
Analyse set of 953 weight vectors
Containing 201 true matches and 752 true non-matches
(21.09% true matches)
Identified 908 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 874 (96.26%)
2 : 31 (3.41%)
3 : 2 (0.22%)
11 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 908 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 952
Number of unique weight vectors: 908
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(908, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 908 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 908 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 821 weight vectors
Based on 26 matches and 61 non-matches
Classified 119 matches and 702 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(119, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(702, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 702 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 702 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.667, 0.000, 0.650, 0.895, 0.706, 0.455, 0.600] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.067, 0.550, 0.636, 0.500, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.333, 0.545, 0.476, 0.727, 0.762] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 13 matches and 59 non-matches
Purity of oracle classification: 0.819
Entropy of oracle classification: 0.681
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)780_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (20, 1 - acm diverg, 780), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)780_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 754
Number of entity ID pairs that occurred more than once: 0
Analyse set of 754 weight vectors
Containing 222 true matches and 532 true non-matches
(29.44% true matches)
Identified 718 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 699 (97.35%)
2 : 16 (2.23%)
3 : 2 (0.28%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 718 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 188
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 529
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 753
Number of unique weight vectors: 718
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(718, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 718 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 718 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.819, 1.000, 0.222, 0.214, 0.182, 0.214, 0.333] (False)
[1.000, 0.000, 0.636, 0.452, 0.632, 0.139, 0.762] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 634 weight vectors
Based on 28 matches and 56 non-matches
Classified 135 matches and 499 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(135, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(499, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 499 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 499 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.741, 0.474, 0.667, 0.500, 0.300] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.710, 0.000, 0.600, 0.654, 0.273, 0.290, 0.217] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 0.000, 0.633, 0.867, 0.500, 0.204, 0.396] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 13 matches and 60 non-matches
Purity of oracle classification: 0.822
Entropy of oracle classification: 0.676
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(10)739_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.978261
recall 0.150502
f-measure 0.26087
da 46
dm 0
ndm 0
tp 45
fp 1
tn 4.76529e+07
fn 254
Name: (10, 1 - acm diverg, 739), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)739_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 814
Number of entity ID pairs that occurred more than once: 0
Analyse set of 814 weight vectors
Containing 220 true matches and 594 true non-matches
(27.03% true matches)
Identified 758 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 722 (95.25%)
2 : 33 (4.35%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 758 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 573
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 813
Number of unique weight vectors: 758
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(758, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 758 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 758 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 673 weight vectors
Based on 27 matches and 58 non-matches
Classified 146 matches and 527 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(146, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(527, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 527 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 527 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.875, 0.545, 0.789, 0.556, 0.385] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[0.800, 0.000, 0.667, 0.636, 0.500, 0.250, 0.400] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 5 matches and 67 non-matches
Purity of oracle classification: 0.931
Entropy of oracle classification: 0.364
Number of true matches: 5
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
46.0
Analisando o arquivo: diverg(15)281_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (15, 1 - acm diverg, 281), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)281_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 711
Number of entity ID pairs that occurred more than once: 0
Analyse set of 711 weight vectors
Containing 203 true matches and 508 true non-matches
(28.55% true matches)
Identified 685 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 671 (97.96%)
2 : 11 (1.61%)
3 : 2 (0.29%)
12 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 685 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 177
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 507
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 710
Number of unique weight vectors: 685
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(685, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 685 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 685 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 28 matches and 56 non-matches
Purity of oracle classification: 0.667
Entropy of oracle classification: 0.918
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 601 weight vectors
Based on 28 matches and 56 non-matches
Classified 137 matches and 464 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(137, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
(464, 0.6666666666666666, 0.9182958340544896, 0.3333333333333333)
Current size of match and non-match training data sets: 28 / 56
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.92
- Size 137 weight vectors
- Estimated match proportion 0.333
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 137 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 49 matches and 4 non-matches
Purity of oracle classification: 0.925
Entropy of oracle classification: 0.386
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(15)504_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (15, 1 - acm diverg, 504), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)504_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 790
Number of entity ID pairs that occurred more than once: 0
Analyse set of 790 weight vectors
Containing 212 true matches and 578 true non-matches
(26.84% true matches)
Identified 738 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 703 (95.26%)
2 : 32 (4.34%)
3 : 2 (0.27%)
17 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 738 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 180
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 557
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 789
Number of unique weight vectors: 738
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(738, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 738 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 738 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 27 matches and 58 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 653 weight vectors
Based on 27 matches and 58 non-matches
Classified 139 matches and 514 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(139, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
(514, 0.6823529411764706, 0.9018043446575508, 0.3176470588235294)
Current size of match and non-match training data sets: 27 / 58
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 514 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 514 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[1.000, 0.000, 0.375, 0.409, 0.400, 0.333, 0.611] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[0.683, 1.000, 0.246, 0.239, 0.070, 0.255, 0.258] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.480, 0.786, 0.773, 0.286, 0.273] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.808, 0.435, 0.700, 0.538, 0.688] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.833, 0.857, 0.316, 0.333, 0.300] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 0.000, 0.700, 0.214, 0.368, 0.529, 0.714] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[0.833, 0.000, 0.500, 0.500, 0.444, 0.059, 0.667] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 0.000, 0.667, 0.500, 0.455, 0.259, 0.250] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.900, 0.429, 0.412, 0.588, 0.357] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 8 matches and 64 non-matches
Purity of oracle classification: 0.889
Entropy of oracle classification: 0.503
Number of true matches: 8
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(15)25_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.140468
f-measure 0.246334
da 42
dm 0
ndm 0
tp 42
fp 0
tn 4.76529e+07
fn 257
Name: (15, 1 - acm diverg, 25), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)25_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1037
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1037 weight vectors
Containing 221 true matches and 816 true non-matches
(21.31% true matches)
Identified 983 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 946 (96.24%)
2 : 34 (3.46%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 983 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 795
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1036
Number of unique weight vectors: 983
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(983, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 983 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 983 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.214, 0.244, 0.103, 0.441, 0.560] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.481, 0.217, 0.125, 0.148, 0.148] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 896 weight vectors
Based on 26 matches and 61 non-matches
Classified 137 matches and 759 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(137, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(759, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 137 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 137 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.879, 1.000, 0.750, 0.750, 0.735, 0.733, 0.722] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 50 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.139
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
42.0
Analisando o arquivo: diverg(20)772_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 772), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)772_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)337_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 337), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)337_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)826_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 826), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)826_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1093
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1093 weight vectors
Containing 226 true matches and 867 true non-matches
(20.68% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 999 (96.43%)
2 : 34 (3.28%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1092
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 25 matches and 63 non-matches
Classified 131 matches and 817 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(131, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(817, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 131 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 49
Farthest first selection of 49 weight vectors from 131 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.933, 1.000, 1.000, 0.900, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
Perform oracle with 100.00 accuracy on 49 weight vectors
The oracle will correctly classify 49 weight vectors and wrongly classify 0
Classified 48 matches and 1 non-matches
Purity of oracle classification: 0.980
Entropy of oracle classification: 0.144
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 49 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)331_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 331), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)331_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1100
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1100 weight vectors
Containing 227 true matches and 873 true non-matches
(20.64% true matches)
Identified 1043 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1006 (96.45%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1043 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 852
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1099
Number of unique weight vectors: 1043
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1043, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1043 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1043 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 955 weight vectors
Based on 23 matches and 65 non-matches
Classified 0 matches and 955 non-matches
39.0
Analisando o arquivo: diverg(10)837_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (10, 1 - acm diverg, 837), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)837_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 761
Number of entity ID pairs that occurred more than once: 0
Analyse set of 761 weight vectors
Containing 187 true matches and 574 true non-matches
(24.57% true matches)
Identified 719 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 688 (95.69%)
2 : 28 (3.89%)
3 : 2 (0.28%)
11 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 719 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 553
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 760
Number of unique weight vectors: 719
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(719, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 719 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 719 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 84 weight vectors
The oracle will correctly classify 84 weight vectors and wrongly classify 0
Classified 30 matches and 54 non-matches
Purity of oracle classification: 0.643
Entropy of oracle classification: 0.940
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 84 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 635 weight vectors
Based on 30 matches and 54 non-matches
Classified 308 matches and 327 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 84
Size, purity, entropy, and estimated match proportion of clusters in queue:
(308, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
(327, 0.6428571428571429, 0.9402859586706309, 0.35714285714285715)
Current size of match and non-match training data sets: 30 / 54
Selected cluster with (queue ordering: random):
- Purity 0.64 and entropy 0.94
- Size 308 weight vectors
- Estimated match proportion 0.357
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 308 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 41 matches and 28 non-matches
Purity of oracle classification: 0.594
Entropy of oracle classification: 0.974
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(20)664_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 664), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)664_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1075
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1075 weight vectors
Containing 208 true matches and 867 true non-matches
(19.35% true matches)
Identified 1028 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 993 (96.60%)
2 : 32 (3.11%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1028 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 846
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1074
Number of unique weight vectors: 1028
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1028, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1028 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1028 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 25 matches and 63 non-matches
Purity of oracle classification: 0.716
Entropy of oracle classification: 0.861
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 940 weight vectors
Based on 25 matches and 63 non-matches
Classified 121 matches and 819 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(121, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
(819, 0.7159090909090909, 0.8609652558547649, 0.2840909090909091)
Current size of match and non-match training data sets: 25 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.86
- Size 121 weight vectors
- Estimated match proportion 0.284
Sample size for this cluster: 48
Farthest first selection of 48 weight vectors from 121 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
Perform oracle with 100.00 accuracy on 48 weight vectors
The oracle will correctly classify 48 weight vectors and wrongly classify 0
Classified 46 matches and 2 non-matches
Purity of oracle classification: 0.958
Entropy of oracle classification: 0.250
Number of true matches: 46
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 48 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)178_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (10, 1 - acm diverg, 178), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)178_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 829
Number of entity ID pairs that occurred more than once: 0
Analyse set of 829 weight vectors
Containing 227 true matches and 602 true non-matches
(27.38% true matches)
Identified 772 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 735 (95.21%)
2 : 34 (4.40%)
3 : 2 (0.26%)
20 : 1 (0.13%)
Identified 1 non-pure unique weight vectors (from 772 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 581
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 828
Number of unique weight vectors: 772
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(772, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 772 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 772 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 28 matches and 57 non-matches
Purity of oracle classification: 0.671
Entropy of oracle classification: 0.914
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 687 weight vectors
Based on 28 matches and 57 non-matches
Classified 150 matches and 537 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(150, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
(537, 0.6705882352941176, 0.914324246431782, 0.32941176470588235)
Current size of match and non-match training data sets: 28 / 57
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 537 weight vectors
- Estimated match proportion 0.329
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 537 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[0.700, 1.000, 0.250, 0.042, 0.154, 0.222, 0.222] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.550, 0.857, 0.833, 0.389, 0.688] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.296, 0.667, 0.421, 0.450, 0.692] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 1.000, 0.233, 0.293, 0.256, 0.175, 0.327] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.500, 0.364, 0.833, 0.417, 0.786] (False)
[1.000, 0.000, 0.385, 0.478, 0.643, 0.692, 0.611] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 9 matches and 64 non-matches
Purity of oracle classification: 0.877
Entropy of oracle classification: 0.539
Number of true matches: 9
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)152_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (15, 1 - acm diverg, 152), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)152_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 683
Number of entity ID pairs that occurred more than once: 0
Analyse set of 683 weight vectors
Containing 201 true matches and 482 true non-matches
(29.43% true matches)
Identified 638 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 604 (94.67%)
2 : 31 (4.86%)
3 : 2 (0.31%)
11 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 638 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 176
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 461
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 682
Number of unique weight vectors: 638
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(638, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 638 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 638 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 26 matches and 57 non-matches
Purity of oracle classification: 0.687
Entropy of oracle classification: 0.897
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 555 weight vectors
Based on 26 matches and 57 non-matches
Classified 129 matches and 426 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(129, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
(426, 0.6867469879518072, 0.8968928834064589, 0.3132530120481928)
Current size of match and non-match training data sets: 26 / 57
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.90
- Size 129 weight vectors
- Estimated match proportion 0.313
Sample size for this cluster: 51
Farthest first selection of 51 weight vectors from 129 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.938, 1.000, 1.000, 0.905, 1.000, 1.000, 1.000] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.875, 0.778, 0.829, 0.917, 0.826] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 51 weight vectors
The oracle will correctly classify 51 weight vectors and wrongly classify 0
Classified 49 matches and 2 non-matches
Purity of oracle classification: 0.961
Entropy of oracle classification: 0.239
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 2
Number of false non-matches: 0
Deleted 51 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)260_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 260), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)260_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 23 matches and 65 non-matches
Classified 109 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 109 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 44
Farthest first selection of 44 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.842, 0.833, 0.895, 0.833, 0.889] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
Perform oracle with 100.00 accuracy on 44 weight vectors
The oracle will correctly classify 44 weight vectors and wrongly classify 0
Classified 43 matches and 1 non-matches
Purity of oracle classification: 0.977
Entropy of oracle classification: 0.156
Number of true matches: 43
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 44 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)656_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985714
recall 0.230769
f-measure 0.373984
da 70
dm 0
ndm 0
tp 69
fp 1
tn 4.76529e+07
fn 230
Name: (15, 1 - acm diverg, 656), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)656_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1005
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1005 weight vectors
Containing 195 true matches and 810 true non-matches
(19.40% true matches)
Identified 963 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 928 (96.37%)
2 : 32 (3.32%)
3 : 2 (0.21%)
7 : 1 (0.10%)
Identified 0 non-pure unique weight vectors (from 963 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.000 : 790
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 1005
Number of unique weight vectors: 963
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(963, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 963 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 963 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.739, 0.857, 0.909, 0.765, 0.524] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 28 matches and 59 non-matches
Purity of oracle classification: 0.678
Entropy of oracle classification: 0.906
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 876 weight vectors
Based on 28 matches and 59 non-matches
Classified 138 matches and 738 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(138, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
(738, 0.6781609195402298, 0.9063701886077911, 0.3218390804597701)
Current size of match and non-match training data sets: 28 / 59
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.91
- Size 138 weight vectors
- Estimated match proportion 0.322
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 138 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.694, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 48 matches and 4 non-matches
Purity of oracle classification: 0.923
Entropy of oracle classification: 0.391
Number of true matches: 48
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
70.0
Analisando o arquivo: diverg(20)570_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 570), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)570_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(20)461_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 461), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)461_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 566 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 73
Farthest first selection of 73 weight vectors from 566 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.909, 0.393, 0.500, 0.471, 0.643] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.867, 0.333, 0.833, 0.143, 0.308] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.939, 1.000, 0.273, 0.083, 0.227, 0.095, 0.294] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.800, 0.000, 0.462, 0.636, 0.364, 0.053, 0.625] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.037, 0.450, 0.727, 0.400, 0.429] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.240, 0.714, 0.455, 0.778, 0.591] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.000, 0.600, 0.647, 0.667, 0.238] (False)
[0.800, 0.000, 0.375, 0.143, 0.267, 0.467, 0.333] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[1.000, 0.000, 0.458, 0.909, 0.350, 0.438, 0.375] (False)
[1.000, 0.000, 0.667, 0.429, 0.789, 0.444, 0.462] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.400, 0.393, 0.318, 0.647, 0.455] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.696, 0.357, 0.909, 0.000, 0.000] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.660, 1.000, 0.222, 0.176, 0.174, 0.077, 0.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[1.000, 0.000, 0.550, 0.833, 0.636, 0.875, 0.545] (False)
[1.000, 0.000, 0.320, 0.545, 0.773, 0.643, 0.591] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 0.000, 0.846, 0.778, 0.727, 0.632, 0.875] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 0.667, 0.467, 0.235, 0.083, 0.467] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 0.000, 0.458, 0.909, 0.250, 0.875, 0.563] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[1.000, 0.000, 0.727, 0.545, 0.263, 0.889, 0.692] (False)
[1.000, 0.000, 0.300, 0.357, 0.818, 0.000, 0.000] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[1.000, 0.000, 0.407, 0.818, 0.625, 0.400, 0.889] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.417, 0.348, 0.733, 0.917, 0.706] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
Perform oracle with 100.00 accuracy on 73 weight vectors
The oracle will correctly classify 73 weight vectors and wrongly classify 0
Classified 7 matches and 66 non-matches
Purity of oracle classification: 0.904
Entropy of oracle classification: 0.456
Number of true matches: 7
Number of false matches: 0
Number of true non-matches: 66
Number of false non-matches: 0
Deleted 73 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)572_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 572), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)572_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1092
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1092 weight vectors
Containing 226 true matches and 866 true non-matches
(20.70% true matches)
Identified 1035 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 998 (96.43%)
2 : 34 (3.29%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1035 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1091
Number of unique weight vectors: 1035
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1035, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1035 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1035 vectors
The selected farthest weight vectors are:
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.700, 0.333, 0.750, 0.636, 0.263] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.038, 0.783, 0.786, 0.615, 0.524] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.769, 0.217, 0.786, 0.000, 0.000] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.000, 0.714, 0.091, 0.333, 0.286, 0.545] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.526, 0.786, 0.304, 0.647, 0.571] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.800, 0.609, 0.857, 0.769, 0.579] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.667, 0.000, 0.650, 0.467, 0.706, 0.389, 0.737] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 24 matches and 64 non-matches
Purity of oracle classification: 0.727
Entropy of oracle classification: 0.845
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 947 weight vectors
Based on 24 matches and 64 non-matches
Classified 91 matches and 856 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(91, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
(856, 0.7272727272727273, 0.8453509366224365, 0.2727272727272727)
Current size of match and non-match training data sets: 24 / 64
Selected cluster with (queue ordering: random):
- Purity 0.73 and entropy 0.85
- Size 856 weight vectors
- Estimated match proportion 0.273
Sample size for this cluster: 70
Farthest first selection of 70 weight vectors from 856 vectors
The selected farthest weight vectors are:
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.667, 0.737, 0.833, 0.818, 0.567] (False)
[0.733, 1.000, 0.100, 0.135, 0.095, 0.176, 0.282] (False)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[1.000, 0.000, 0.792, 0.500, 0.550, 0.000, 0.000] (False)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.238, 0.727, 0.429] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.907, 1.000, 0.667, 0.118, 0.091, 0.063, 0.188] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 18 matches and 52 non-matches
Purity of oracle classification: 0.743
Entropy of oracle classification: 0.822
Number of true matches: 18
Number of false matches: 0
Number of true non-matches: 52
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)3_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 3), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)3_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 655
Number of entity ID pairs that occurred more than once: 0
Analyse set of 655 weight vectors
Containing 213 true matches and 442 true non-matches
(32.52% true matches)
Identified 618 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 600 (97.09%)
2 : 15 (2.43%)
3 : 2 (0.32%)
19 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 618 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 178
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 439
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 654
Number of unique weight vectors: 618
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(618, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 618 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 618 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.385, 0.826, 0.429, 0.769, 0.588] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 29 matches and 54 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.934
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 535 weight vectors
Based on 29 matches and 54 non-matches
Classified 152 matches and 383 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(152, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
(383, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
Current size of match and non-match training data sets: 29 / 54
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 383 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 71
Farthest first selection of 71 weight vectors from 383 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.385, 0.714, 0.500, 0.647, 0.643] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.433, 1.000, 0.161, 0.172, 0.107, 0.185, 0.000] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.667, 0.000, 0.750, 0.810, 0.333, 0.714, 0.400] (False)
[1.000, 0.000, 0.750, 0.538, 0.409, 0.548, 0.357] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[1.000, 0.000, 0.269, 0.677, 0.684, 0.385, 0.524] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.370, 0.321, 0.600, 0.650, 0.643] (False)
[1.000, 0.000, 0.857, 0.452, 0.526, 0.278, 0.619] (False)
[0.667, 0.000, 0.550, 0.467, 0.706, 0.444, 0.789] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.500, 0.565, 0.857, 0.538, 0.786] (False)
[1.000, 0.000, 0.304, 0.452, 0.526, 0.294, 0.810] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 0.000, 0.870, 0.619, 0.643, 0.700, 0.524] (False)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.625, 0.857, 0.667, 0.786, 0.529] (False)
[1.000, 0.000, 0.433, 0.867, 0.833, 0.636, 0.737] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.550, 0.737, 0.833, 0.278, 0.533] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[0.667, 0.000, 0.769, 0.739, 0.786, 0.692, 0.367] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[0.720, 1.000, 0.333, 0.333, 0.333, 0.667, 0.667] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.500, 0.714, 0.450, 0.412, 0.875] (False)
[1.000, 0.000, 0.526, 0.792, 0.261, 0.733, 0.471] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.538, 0.613, 0.789, 0.227, 0.857] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.857, 0.636, 0.818, 0.174, 0.556] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.677, 0.000, 0.467, 0.613, 0.316, 0.556, 0.652] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
Perform oracle with 100.00 accuracy on 71 weight vectors
The oracle will correctly classify 71 weight vectors and wrongly classify 0
Classified 4 matches and 67 non-matches
Purity of oracle classification: 0.944
Entropy of oracle classification: 0.313
Number of true matches: 4
Number of false matches: 0
Number of true non-matches: 67
Number of false non-matches: 0
Deleted 71 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)449_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 449), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)449_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 548
Number of entity ID pairs that occurred more than once: 0
Analyse set of 548 weight vectors
Containing 226 true matches and 322 true non-matches
(41.24% true matches)
Identified 509 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 490 (96.27%)
2 : 16 (3.14%)
3 : 2 (0.39%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 509 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 319
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 547
Number of unique weight vectors: 509
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(509, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 509 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 81
Perform initial selection using "far" method
Farthest first selection of 81 weight vectors from 509 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 1.000, 0.171, 0.140, 0.105, 0.206, 1.000] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
Perform oracle with 100.00 accuracy on 81 weight vectors
The oracle will correctly classify 81 weight vectors and wrongly classify 0
Classified 33 matches and 48 non-matches
Purity of oracle classification: 0.593
Entropy of oracle classification: 0.975
Number of true matches: 33
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 81 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 428 weight vectors
Based on 33 matches and 48 non-matches
Classified 152 matches and 276 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 81
Size, purity, entropy, and estimated match proportion of clusters in queue:
(152, 0.5925925925925926, 0.975119064940866, 0.4074074074074074)
(276, 0.5925925925925926, 0.975119064940866, 0.4074074074074074)
Current size of match and non-match training data sets: 33 / 48
Selected cluster with (queue ordering: random):
- Purity 0.59 and entropy 0.98
- Size 152 weight vectors
- Estimated match proportion 0.407
Sample size for this cluster: 58
Farthest first selection of 58 weight vectors from 152 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.833, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 58 weight vectors
The oracle will correctly classify 58 weight vectors and wrongly classify 0
Classified 53 matches and 5 non-matches
Purity of oracle classification: 0.914
Entropy of oracle classification: 0.424
Number of true matches: 53
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 58 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)598_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 598), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)598_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 862
Number of entity ID pairs that occurred more than once: 0
Analyse set of 862 weight vectors
Containing 227 true matches and 635 true non-matches
(26.33% true matches)
Identified 805 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 768 (95.40%)
2 : 34 (4.22%)
3 : 2 (0.25%)
20 : 1 (0.12%)
Identified 1 non-pure unique weight vectors (from 805 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 614
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 861
Number of unique weight vectors: 805
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(805, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 805 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 86
Perform initial selection using "far" method
Farthest first selection of 86 weight vectors from 805 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 86 weight vectors
The oracle will correctly classify 86 weight vectors and wrongly classify 0
Classified 28 matches and 58 non-matches
Purity of oracle classification: 0.674
Entropy of oracle classification: 0.910
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 86 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 719 weight vectors
Based on 28 matches and 58 non-matches
Classified 153 matches and 566 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 86
Size, purity, entropy, and estimated match proportion of clusters in queue:
(153, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
(566, 0.6744186046511628, 0.9103480624345153, 0.32558139534883723)
Current size of match and non-match training data sets: 28 / 58
Selected cluster with (queue ordering: random):
- Purity 0.67 and entropy 0.91
- Size 153 weight vectors
- Estimated match proportion 0.326
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 153 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.520, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.944, 1.000, 1.000, 1.000, 0.960, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.857, 0.727, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 50 matches and 5 non-matches
Purity of oracle classification: 0.909
Entropy of oracle classification: 0.439
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)2_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.190635
f-measure 0.320225
da 57
dm 0
ndm 0
tp 57
fp 0
tn 4.76529e+07
fn 242
Name: (20, 1 - acm diverg, 2), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)2_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1083
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1083 weight vectors
Containing 209 true matches and 874 true non-matches
(19.30% true matches)
Identified 1036 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1001 (96.62%)
2 : 32 (3.09%)
3 : 2 (0.19%)
12 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1036 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 182
0.917 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1082
Number of unique weight vectors: 1036
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1036, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1036 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1036 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 23 matches and 65 non-matches
Purity of oracle classification: 0.739
Entropy of oracle classification: 0.829
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 65
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 948 weight vectors
Based on 23 matches and 65 non-matches
Classified 101 matches and 847 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(101, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
(847, 0.7386363636363636, 0.828797154590015, 0.26136363636363635)
Current size of match and non-match training data sets: 23 / 65
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 847 weight vectors
- Estimated match proportion 0.261
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 847 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
57.0
Analisando o arquivo: diverg(10)716_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.980198
recall 0.331104
f-measure 0.495
da 101
dm 0
ndm 0
tp 99
fp 2
tn 4.76529e+07
fn 200
Name: (10, 1 - acm diverg, 716), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)716_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 186
Number of entity ID pairs that occurred more than once: 0
Analyse set of 186 weight vectors
Containing 149 true matches and 37 true non-matches
(80.11% true matches)
Identified 174 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 166 (95.40%)
2 : 5 (2.87%)
3 : 2 (1.15%)
4 : 1 (0.57%)
Identified 0 non-pure unique weight vectors (from 174 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 137
0.000 : 37
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 186
Number of unique weight vectors: 174
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(174, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 174 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 62
Perform initial selection using "far" method
Farthest first selection of 62 weight vectors from 174 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.900, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.667, 0.867, 0.412, 0.647, 0.571] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.667, 1.000, 0.933, 1.000, 0.947, 1.000, 0.947] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.545, 0.857, 0.750, 0.500, 0.813] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.909, 0.818, 0.700, 0.625, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 62 weight vectors
The oracle will correctly classify 62 weight vectors and wrongly classify 0
Classified 36 matches and 26 non-matches
Purity of oracle classification: 0.581
Entropy of oracle classification: 0.981
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 26
Number of false non-matches: 0
Deleted 62 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 112 weight vectors
Based on 36 matches and 26 non-matches
Classified 112 matches and 0 non-matches
101.0
Analisando o arquivo: diverg(10)48_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.990566
recall 0.351171
f-measure 0.518519
da 106
dm 0
ndm 0
tp 105
fp 1
tn 4.76529e+07
fn 194
Name: (10, 1 - acm diverg, 48), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)48_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 762
Number of entity ID pairs that occurred more than once: 0
Analyse set of 762 weight vectors
Containing 161 true matches and 601 true non-matches
(21.13% true matches)
Identified 723 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 694 (95.99%)
2 : 26 (3.60%)
3 : 2 (0.28%)
10 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 723 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 142
0.900 : 1 (all weight vectors with this pureness to be removed)
0.000 : 580
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 761
Number of unique weight vectors: 723
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(723, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 723 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 85
Perform initial selection using "far" method
Farthest first selection of 85 weight vectors from 723 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 85 weight vectors
The oracle will correctly classify 85 weight vectors and wrongly classify 0
Classified 26 matches and 59 non-matches
Purity of oracle classification: 0.694
Entropy of oracle classification: 0.888
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 59
Number of false non-matches: 0
Deleted 85 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 638 weight vectors
Based on 26 matches and 59 non-matches
Classified 85 matches and 553 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 85
Size, purity, entropy, and estimated match proportion of clusters in queue:
(85, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
(553, 0.6941176470588235, 0.8883630233845602, 0.3058823529411765)
Current size of match and non-match training data sets: 26 / 59
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 85 weight vectors
- Estimated match proportion 0.306
Sample size for this cluster: 42
Farthest first selection of 42 weight vectors from 85 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[1.000, 1.000, 1.000, 1.000, 0.971, 0.952, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.975, 1.000, 1.000, 1.000, 0.824, 0.786, 1.000] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 42 weight vectors
The oracle will correctly classify 42 weight vectors and wrongly classify 0
Classified 41 matches and 1 non-matches
Purity of oracle classification: 0.976
Entropy of oracle classification: 0.162
Number of true matches: 41
Number of false matches: 0
Number of true non-matches: 1
Number of false non-matches: 0
Deleted 42 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
106.0
Analisando o arquivo: diverg(10)14_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.19398
f-measure 0.32493
da 58
dm 0
ndm 0
tp 58
fp 0
tn 4.76529e+07
fn 241
Name: (10, 1 - acm diverg, 14), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)14_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 698
Number of entity ID pairs that occurred more than once: 0
Analyse set of 698 weight vectors
Containing 198 true matches and 500 true non-matches
(28.37% true matches)
Identified 653 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 619 (94.79%)
2 : 31 (4.75%)
3 : 2 (0.31%)
11 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 653 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 173
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 479
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 697
Number of unique weight vectors: 653
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(653, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 653 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 653 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 25 matches and 58 non-matches
Purity of oracle classification: 0.699
Entropy of oracle classification: 0.883
Number of true matches: 25
Number of false matches: 0
Number of true non-matches: 58
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 570 weight vectors
Based on 25 matches and 58 non-matches
Classified 143 matches and 427 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(143, 0.6987951807228916, 0.8827586787955115, 0.30120481927710846)
(427, 0.6987951807228916, 0.8827586787955115, 0.30120481927710846)
Current size of match and non-match training data sets: 25 / 58
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 143 weight vectors
- Estimated match proportion 0.301
Sample size for this cluster: 52
Farthest first selection of 52 weight vectors from 143 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.880, 1.000, 1.000, 0.929, 1.000, 0.889, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.875, 1.000, 1.000, 1.000, 1.000] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
Perform oracle with 100.00 accuracy on 52 weight vectors
The oracle will correctly classify 52 weight vectors and wrongly classify 0
Classified 49 matches and 3 non-matches
Purity of oracle classification: 0.942
Entropy of oracle classification: 0.318
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 52 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
58.0
Analisando o arquivo: diverg(20)202_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (20, 1 - acm diverg, 202), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)202_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 971
Number of entity ID pairs that occurred more than once: 0
Analyse set of 971 weight vectors
Containing 219 true matches and 752 true non-matches
(22.55% true matches)
Identified 916 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 880 (96.07%)
2 : 33 (3.60%)
3 : 2 (0.22%)
19 : 1 (0.11%)
Identified 1 non-pure unique weight vectors (from 916 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 731
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 970
Number of unique weight vectors: 916
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(916, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 916 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 916 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 24 matches and 63 non-matches
Purity of oracle classification: 0.724
Entropy of oracle classification: 0.850
Number of true matches: 24
Number of false matches: 0
Number of true non-matches: 63
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 829 weight vectors
Based on 24 matches and 63 non-matches
Classified 123 matches and 706 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(123, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
(706, 0.7241379310344828, 0.8497511372532974, 0.27586206896551724)
Current size of match and non-match training data sets: 24 / 63
Selected cluster with (queue ordering: random):
- Purity 0.72 and entropy 0.85
- Size 706 weight vectors
- Estimated match proportion 0.276
Sample size for this cluster: 69
Farthest first selection of 69 weight vectors from 706 vectors
The selected farthest weight vectors are:
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 1.000, 1.000, 0.806, 0.103, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.956, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[1.000, 0.000, 0.583, 0.444, 0.818, 0.706, 0.857] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.714, 0.304, 0.533, 0.833, 0.529] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 69 weight vectors
The oracle will correctly classify 69 weight vectors and wrongly classify 0
Classified 13 matches and 56 non-matches
Purity of oracle classification: 0.812
Entropy of oracle classification: 0.698
Number of true matches: 13
Number of false matches: 0
Number of true non-matches: 56
Number of false non-matches: 0
Deleted 69 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(20)4_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.981132
recall 0.173913
f-measure 0.295455
da 53
dm 0
ndm 0
tp 52
fp 1
tn 4.76529e+07
fn 247
Name: (20, 1 - acm diverg, 4), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)4_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1041
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1041 weight vectors
Containing 213 true matches and 828 true non-matches
(20.46% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 954 (96.46%)
2 : 32 (3.24%)
3 : 2 (0.20%)
17 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.941 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 807
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1040
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.387, 1.000, 0.146, 0.200, 0.200, 0.111, 0.115] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 26 matches and 61 non-matches
Purity of oracle classification: 0.701
Entropy of oracle classification: 0.880
Number of true matches: 26
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 26 matches and 61 non-matches
Classified 109 matches and 793 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(109, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
(793, 0.7011494252873564, 0.8798813089176425, 0.2988505747126437)
Current size of match and non-match training data sets: 26 / 61
Selected cluster with (queue ordering: random):
- Purity 0.70 and entropy 0.88
- Size 109 weight vectors
- Estimated match proportion 0.299
Sample size for this cluster: 47
Farthest first selection of 47 weight vectors from 109 vectors
The selected farthest weight vectors are:
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 1.000, 0.870, 0.875, 0.867, 0.889, 0.900] (True)
[0.900, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.765] (True)
[0.950, 0.778, 0.938, 0.947, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.933, 1.000, 0.943, 1.000, 0.917, 0.952, 0.913] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.941] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.929, 0.917, 0.857, 0.933] (True)
[1.000, 1.000, 1.000, 1.000, 0.889, 1.000, 1.000] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.644, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.825, 1.000, 0.826, 1.000, 1.000, 1.000, 1.000] (True)
[0.952, 1.000, 0.813, 0.850, 0.824, 0.929, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.941, 1.000, 0.800] (True)
[1.000, 1.000, 0.889, 0.933, 1.000, 1.000, 0.917] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.956, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
Perform oracle with 100.00 accuracy on 47 weight vectors
The oracle will correctly classify 47 weight vectors and wrongly classify 0
Classified 47 matches and 0 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 47
Number of false matches: 0
Number of true non-matches: 0
Number of false non-matches: 0
*** Warning: Oracle returns an empty non-match dictionary ***
Deleted 47 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
53.0
Analisando o arquivo: diverg(10)962_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 962), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)962_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 208
Number of entity ID pairs that occurred more than once: 0
Analyse set of 208 weight vectors
Containing 180 true matches and 28 true non-matches
(86.54% true matches)
Identified 190 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 178 (93.68%)
2 : 9 (4.74%)
3 : 2 (1.05%)
6 : 1 (0.53%)
Identified 0 non-pure unique weight vectors (from 190 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 162
0.000 : 28
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 208
Number of unique weight vectors: 190
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(190, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 190 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 64
Perform initial selection using "far" method
Farthest first selection of 64 weight vectors from 190 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.600, 1.000, 1.000, 1.000, 1.000, 0.952, 1.000] (True)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 64 weight vectors
The oracle will correctly classify 64 weight vectors and wrongly classify 0
Classified 42 matches and 22 non-matches
Purity of oracle classification: 0.656
Entropy of oracle classification: 0.928
Number of true matches: 42
Number of false matches: 0
Number of true non-matches: 22
Number of false non-matches: 0
Deleted 64 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 126 weight vectors
Based on 42 matches and 22 non-matches
Classified 126 matches and 0 non-matches
69.0
Analisando o arquivo: diverg(10)753_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.143813
f-measure 0.251462
da 43
dm 0
ndm 0
tp 43
fp 0
tn 4.76529e+07
fn 256
Name: (10, 1 - acm diverg, 753), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)753_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 659
Number of entity ID pairs that occurred more than once: 0
Analyse set of 659 weight vectors
Containing 213 true matches and 446 true non-matches
(32.32% true matches)
Identified 607 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 571 (94.07%)
2 : 33 (5.44%)
3 : 2 (0.33%)
16 : 1 (0.16%)
Identified 1 non-pure unique weight vectors (from 607 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 181
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 425
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 658
Number of unique weight vectors: 607
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(607, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 607 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 607 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.857, 0.250, 0.667, 0.286, 0.600] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.267, 1.000, 0.762, 0.727, 0.619] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 0.442, 0.239, 0.233, 0.171, 0.122] (False)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.450, 0.417, 0.647, 0.000, 0.000] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 29 matches and 54 non-matches
Purity of oracle classification: 0.651
Entropy of oracle classification: 0.934
Number of true matches: 29
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 524 weight vectors
Based on 29 matches and 54 non-matches
Classified 179 matches and 345 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(179, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
(345, 0.6506024096385542, 0.9335289015212996, 0.3493975903614458)
Current size of match and non-match training data sets: 29 / 54
Selected cluster with (queue ordering: random):
- Purity 0.65 and entropy 0.93
- Size 179 weight vectors
- Estimated match proportion 0.349
Sample size for this cluster: 59
Farthest first selection of 59 weight vectors from 179 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.800, 1.000, 1.000, 0.118, 0.227, 0.082, 0.061] (False)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[1.000, 0.944, 0.842, 0.917, 0.813, 0.871, 0.833] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.954, 1.000, 0.250, 0.154, 0.233, 0.364, 0.190] (False)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.933, 1.000, 0.914, 0.750, 0.917, 0.857, 0.913] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.871, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.942, 1.000, 0.156, 0.172, 0.189, 0.148, 0.133] (False)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
Perform oracle with 100.00 accuracy on 59 weight vectors
The oracle will correctly classify 59 weight vectors and wrongly classify 0
Classified 45 matches and 14 non-matches
Purity of oracle classification: 0.763
Entropy of oracle classification: 0.791
Number of true matches: 45
Number of false matches: 0
Number of true non-matches: 14
Number of false non-matches: 0
Deleted 59 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
43.0
Analisando o arquivo: diverg(15)774_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.987342
recall 0.26087
f-measure 0.412698
da 79
dm 0
ndm 0
tp 78
fp 1
tn 4.76529e+07
fn 221
Name: (15, 1 - acm diverg, 774), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)774_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1031
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1031 weight vectors
Containing 187 true matches and 844 true non-matches
(18.14% true matches)
Identified 989 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 958 (96.87%)
2 : 28 (2.83%)
3 : 2 (0.20%)
11 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 989 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 165
0.909 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 823
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1030
Number of unique weight vectors: 989
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(989, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 989 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 989 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.800, 1.000, 0.261, 0.158, 0.250, 0.038, 0.282] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.711, 0.000, 0.800, 0.762, 0.857, 0.778, 0.348] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 30 matches and 57 non-matches
Purity of oracle classification: 0.655
Entropy of oracle classification: 0.929
Number of true matches: 30
Number of false matches: 0
Number of true non-matches: 57
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 902 weight vectors
Based on 30 matches and 57 non-matches
Classified 308 matches and 594 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(308, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
(594, 0.6551724137931034, 0.9293636260137187, 0.3448275862068966)
Current size of match and non-match training data sets: 30 / 57
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.93
- Size 308 weight vectors
- Estimated match proportion 0.345
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 308 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.581, 1.000, 0.091, 0.213, 0.138, 0.206, 0.083] (False)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[0.622, 1.000, 0.243, 0.000, 0.042, 0.156, 0.120] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.875, 1.000, 0.250, 0.333, 0.214, 0.122, 0.111] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.778, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
[0.875, 1.000, 0.574, 0.227, 0.167, 0.117, 0.196] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.781, 1.000, 0.231, 0.183, 0.114, 0.245, 0.000] (False)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[0.854, 1.000, 0.128, 0.163, 0.042, 0.121, 0.138] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[0.967, 1.000, 1.000, 0.867, 0.875, 1.000, 0.882] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[0.751, 1.000, 0.842, 0.000, 0.000, 0.000, 0.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.781, 1.000, 0.407, 0.415, 0.258, 0.222, 0.219] (False)
[0.857, 1.000, 0.000, 0.263, 0.190, 0.136, 0.000] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 40 matches and 28 non-matches
Purity of oracle classification: 0.588
Entropy of oracle classification: 0.977
Number of true matches: 40
Number of false matches: 0
Number of true non-matches: 28
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
79.0
Analisando o arquivo: diverg(10)371_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.985507
recall 0.227425
f-measure 0.369565
da 69
dm 0
ndm 0
tp 68
fp 1
tn 4.76529e+07
fn 231
Name: (10, 1 - acm diverg, 371), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)371_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 275
Number of entity ID pairs that occurred more than once: 0
Analyse set of 275 weight vectors
Containing 178 true matches and 97 true non-matches
(64.73% true matches)
Identified 254 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 239 (94.09%)
2 : 12 (4.72%)
3 : 2 (0.79%)
6 : 1 (0.39%)
Identified 0 non-pure unique weight vectors (from 254 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 159
0.000 : 95
Removed 0 non-pure weight vectors
Final number of weight vectors to use: 275
Number of unique weight vectors: 254
Time to load and analyse the weight vector file: 0.00 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(254, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 254 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 70
Perform initial selection using "far" method
Farthest first selection of 70 weight vectors from 254 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 1.000, 0.944, 0.083, 0.167, 0.063, 0.150] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.778, 0.636, 0.375, 0.556, 0.625] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.900, 0.789, 0.458, 0.185, 0.521] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 0.000, 0.636, 0.909, 0.313, 0.625, 0.556] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 70 weight vectors
The oracle will correctly classify 70 weight vectors and wrongly classify 0
Classified 36 matches and 34 non-matches
Purity of oracle classification: 0.514
Entropy of oracle classification: 0.999
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 34
Number of false non-matches: 0
Deleted 70 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 184 weight vectors
Based on 36 matches and 34 non-matches
Classified 128 matches and 56 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 70
Size, purity, entropy, and estimated match proportion of clusters in queue:
(128, 0.5142857142857142, 0.9994110647387553, 0.5142857142857142)
(56, 0.5142857142857142, 0.9994110647387553, 0.5142857142857142)
Current size of match and non-match training data sets: 36 / 34
Selected cluster with (queue ordering: random):
- Purity 0.51 and entropy 1.00
- Size 128 weight vectors
- Estimated match proportion 0.514
Sample size for this cluster: 55
Farthest first selection of 55 weight vectors from 128 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.857, 1.000, 0.930, 0.912, 1.000, 0.936, 1.000] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[1.000, 1.000, 0.944, 1.000, 0.900, 0.938, 0.867] (True)
[0.867, 1.000, 0.700, 0.667, 0.765, 0.667, 0.750] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.833, 1.000] (True)
[0.960, 1.000, 1.000, 1.000, 1.000, 1.000, 0.882] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[0.789, 1.000, 0.920, 0.867, 1.000, 0.909, 0.864] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.922, 1.000, 1.000, 1.000, 1.000, 0.933, 0.710] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 0.970, 0.750, 1.000, 0.905, 1.000] (True)
[1.000, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.875, 1.000, 1.000, 0.833] (True)
[0.956, 0.694, 1.000, 1.000, 0.969, 1.000, 0.950] (True)
[0.467, 1.000, 0.917, 0.842, 0.882, 0.171, 0.091] (False)
[0.993, 1.000, 0.905, 0.875, 1.000, 0.833, 0.909] (True)
[1.000, 1.000, 1.000, 0.909, 0.706, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[1.000, 1.000, 0.850, 0.125, 0.278, 0.118, 0.167] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.967, 1.000, 0.889, 1.000, 0.857, 0.644, 0.913] (True)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.933, 1.000, 0.800, 0.964, 0.933, 1.000, 1.000] (True)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 1.000, 0.923, 0.842, 0.824, 1.000] (True)
[1.000, 1.000, 0.929, 0.824, 0.955, 1.000, 0.938] (True)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.833, 1.000, 0.913, 1.000, 1.000, 0.957, 0.875] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 0.929, 0.800, 0.857, 0.857, 0.846] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[0.436, 1.000, 1.000, 1.000, 1.000, 0.920, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
Perform oracle with 100.00 accuracy on 55 weight vectors
The oracle will correctly classify 55 weight vectors and wrongly classify 0
Classified 51 matches and 4 non-matches
Purity of oracle classification: 0.927
Entropy of oracle classification: 0.376
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 4
Number of false non-matches: 0
Deleted 55 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
69.0
Analisando o arquivo: diverg(20)864_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 864), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)864_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1101
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1101 weight vectors
Containing 227 true matches and 874 true non-matches
(20.62% true matches)
Identified 1044 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 1007 (96.46%)
2 : 34 (3.26%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1044 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 853
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1100
Number of unique weight vectors: 1044
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1044, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1044 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1044 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 28 matches and 60 non-matches
Purity of oracle classification: 0.682
Entropy of oracle classification: 0.902
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 60
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 956 weight vectors
Based on 28 matches and 60 non-matches
Classified 156 matches and 800 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(156, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
(800, 0.6818181818181818, 0.9023932827949789, 0.3181818181818182)
Current size of match and non-match training data sets: 28 / 60
Selected cluster with (queue ordering: random):
- Purity 0.68 and entropy 0.90
- Size 156 weight vectors
- Estimated match proportion 0.318
Sample size for this cluster: 54
Farthest first selection of 54 weight vectors from 156 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 54 weight vectors
The oracle will correctly classify 54 weight vectors and wrongly classify 0
Classified 49 matches and 5 non-matches
Purity of oracle classification: 0.907
Entropy of oracle classification: 0.445
Number of true matches: 49
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 54 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)758_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 758), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)758_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1076
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1076 weight vectors
Containing 227 true matches and 849 true non-matches
(21.10% true matches)
Identified 1019 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 982 (96.37%)
2 : 34 (3.34%)
3 : 2 (0.20%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1019 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 190
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 828
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1075
Number of unique weight vectors: 1019
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1019, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1019 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 87
Perform initial selection using "far" method
Farthest first selection of 87 weight vectors from 1019 vectors
The selected farthest weight vectors are:
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.917, 0.000, 0.900, 0.731, 0.273, 0.355, 0.235] (False)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 0.188, 0.140, 0.132, 0.162, 1.000] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.400, 0.833, 0.579, 0.000, 0.000] (False)
[0.800, 0.000, 0.667, 0.273, 0.500, 0.037, 0.143] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.500, 0.111, 0.235, 0.409, 0.316] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.455, 0.571, 0.500, 0.600, 0.400] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.808, 0.217, 0.714, 0.538, 0.455] (False)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[0.698, 1.000, 0.431, 0.345, 0.333, 0.323, 0.039] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.650, 0.360, 0.100, 0.348, 0.500] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.211, 0.190, 0.471, 0.267, 0.615] (False)
[1.000, 0.000, 0.259, 0.818, 0.500, 0.250, 0.556] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[0.667, 0.000, 0.440, 0.818, 0.318, 0.750, 0.273] (False)
[1.000, 0.000, 0.292, 0.909, 0.750, 0.714, 0.313] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.600, 0.600, 0.688, 0.000, 0.533] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 0.750, 0.375, 1.000, 0.148, 0.214] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
Perform oracle with 100.00 accuracy on 87 weight vectors
The oracle will correctly classify 87 weight vectors and wrongly classify 0
Classified 23 matches and 64 non-matches
Purity of oracle classification: 0.736
Entropy of oracle classification: 0.833
Number of true matches: 23
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 87 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 932 weight vectors
Based on 23 matches and 64 non-matches
Classified 112 matches and 820 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 87
Size, purity, entropy, and estimated match proportion of clusters in queue:
(112, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
(820, 0.735632183908046, 0.8332661971210124, 0.26436781609195403)
Current size of match and non-match training data sets: 23 / 64
Selected cluster with (queue ordering: random):
- Purity 0.74 and entropy 0.83
- Size 820 weight vectors
- Estimated match proportion 0.264
Sample size for this cluster: 68
Farthest first selection of 68 weight vectors from 820 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.875, 0.222, 0.556, 0.296, 0.286] (False)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[1.000, 0.000, 0.400, 0.429, 0.810, 0.364, 0.286] (False)
[1.000, 1.000, 0.194, 0.167, 0.229, 0.222, 0.750] (False)
[1.000, 1.000, 0.129, 0.053, 0.050, 0.533, 0.344] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[1.000, 0.000, 0.067, 0.300, 0.579, 0.889, 0.571] (False)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[1.000, 0.000, 0.043, 0.650, 0.818, 0.471, 0.905] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.767, 1.000, 0.300, 0.250, 0.091, 0.056, 0.076] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[1.000, 0.000, 0.417, 0.909, 0.500, 0.636, 0.889] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.833, 0.000, 0.556, 0.304, 0.267, 0.091, 0.588] (False)
[1.000, 0.000, 0.633, 0.833, 0.524, 0.000, 0.000] (False)
[1.000, 1.000, 0.000, 0.000, 0.143, 0.100, 0.000] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.714, 0.087, 0.267, 0.571, 0.353] (False)
[0.635, 1.000, 1.000, 0.176, 0.214, 0.120, 0.143] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[0.717, 1.000, 0.778, 0.867, 0.875, 1.000, 0.188] (True)
[1.000, 1.000, 0.308, 0.250, 0.381, 0.250, 0.200] (True)
[1.000, 0.000, 0.000, 0.550, 0.688, 0.625, 0.238] (False)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.750, 0.167, 0.182, 0.000, 0.000] (False)
[0.667, 0.000, 0.350, 0.613, 0.632, 0.500, 0.633] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.294, 1.000, 0.128, 0.156, 0.152, 0.167, 0.180] (False)
[0.504, 1.000, 0.167, 0.095, 0.143, 0.135, 1.000] (False)
[1.000, 0.000, 0.833, 0.261, 0.667, 0.846, 0.524] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 0.000, 0.846, 0.643, 0.818, 0.789, 0.875] (False)
[1.000, 0.000, 0.500, 0.179, 0.636, 0.059, 0.000] (False)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
Perform oracle with 100.00 accuracy on 68 weight vectors
The oracle will correctly classify 68 weight vectors and wrongly classify 0
Classified 14 matches and 54 non-matches
Purity of oracle classification: 0.794
Entropy of oracle classification: 0.734
Number of true matches: 14
Number of false matches: 0
Number of true non-matches: 54
Number of false non-matches: 0
Deleted 68 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(10)201_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 0.984127
recall 0.207358
f-measure 0.342541
da 63
dm 0
ndm 0
tp 62
fp 1
tn 4.76529e+07
fn 237
Name: (10, 1 - acm diverg, 201), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)201_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 681
Number of entity ID pairs that occurred more than once: 0
Analyse set of 681 weight vectors
Containing 187 true matches and 494 true non-matches
(27.46% true matches)
Identified 649 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 633 (97.53%)
2 : 13 (2.00%)
3 : 2 (0.31%)
16 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 649 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 157
0.938 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 491
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 680
Number of unique weight vectors: 649
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(649, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 649 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 649 vectors
The selected farthest weight vectors are:
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[1.000, 0.000, 0.667, 0.476, 0.200, 0.500, 0.688] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 0.000, 0.346, 0.769, 0.636, 0.419, 0.364] (False)
[0.673, 0.000, 0.733, 0.737, 0.500, 0.250, 0.652] (False)
[1.000, 0.000, 0.375, 0.385, 0.773, 0.226, 0.313] (False)
[0.950, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.333, 0.346, 0.364, 0.613, 0.364] (False)
[1.000, 0.000, 0.778, 0.900, 0.400, 0.350, 0.563] (False)
[0.667, 0.000, 0.850, 0.750, 0.522, 0.667, 0.300] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[1.000, 0.000, 0.778, 0.875, 0.333, 0.900, 0.444] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.308, 0.609, 0.471, 0.846, 0.714] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[1.000, 0.000, 0.750, 0.929, 0.789, 0.211, 0.545] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.909, 0.500, 0.500, 0.361, 0.625] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.526, 0.536, 0.292, 0.241, 0.208] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 0.000, 0.917, 0.786, 0.667, 0.472, 0.875] (False)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 0.000, 0.583, 0.875, 0.611, 0.833, 0.778] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.391, 1.000, 0.130, 0.150, 0.200, 0.150, 0.074] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[1.000, 0.000, 0.900, 0.643, 0.318, 0.452, 0.286] (False)
[1.000, 0.000, 0.769, 0.609, 0.714, 0.765, 0.524] (False)
[0.667, 0.000, 0.450, 0.692, 0.545, 0.323, 0.167] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667, 0.000, 0.826, 0.467, 0.588, 0.722, 0.810] (False)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.733, 0.000, 0.500, 0.800, 0.500, 0.909, 0.533] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 35 matches and 48 non-matches
Purity of oracle classification: 0.578
Entropy of oracle classification: 0.982
Number of true matches: 35
Number of false matches: 0
Number of true non-matches: 48
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 566 weight vectors
Based on 35 matches and 48 non-matches
Classified 259 matches and 307 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(259, 0.5783132530120482, 0.9822309298084992, 0.42168674698795183)
(307, 0.5783132530120482, 0.9822309298084992, 0.42168674698795183)
Current size of match and non-match training data sets: 35 / 48
Selected cluster with (queue ordering: random):
- Purity 0.58 and entropy 0.98
- Size 307 weight vectors
- Estimated match proportion 0.422
Sample size for this cluster: 72
Farthest first selection of 72 weight vectors from 307 vectors
The selected farthest weight vectors are:
[0.667, 0.000, 0.600, 0.737, 0.833, 0.700, 0.467] (False)
[1.000, 0.000, 0.478, 0.714, 0.700, 0.824, 0.286] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[1.000, 0.000, 0.684, 0.792, 0.261, 0.467, 0.636] (False)
[0.800, 0.000, 0.526, 0.750, 0.250, 0.204, 0.313] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[0.667, 0.000, 0.450, 0.733, 0.682, 0.516, 0.263] (False)
[1.000, 0.000, 0.632, 0.789, 0.667, 0.407, 0.417] (False)
[0.667, 0.000, 0.300, 0.467, 0.529, 0.722, 0.684] (False)
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.000, 0.778, 0.727, 0.875, 0.833, 0.333] (False)
[1.000, 0.000, 0.458, 0.714, 0.600, 0.194, 0.938] (False)
[1.000, 0.000, 0.481, 0.474, 0.471, 0.773, 0.450] (False)
[1.000, 0.000, 0.846, 0.667, 0.500, 0.194, 0.500] (False)
[0.667, 0.000, 0.500, 0.667, 0.353, 0.556, 0.789] (False)
[0.667, 0.000, 0.600, 0.467, 0.471, 0.722, 0.737] (False)
[1.000, 0.000, 0.233, 0.484, 0.579, 0.455, 0.714] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.667, 0.000, 0.833, 0.526, 0.600, 0.700, 0.500] (False)
[1.000, 0.000, 0.462, 0.667, 0.600, 0.389, 0.615] (False)
[0.704, 0.000, 0.867, 0.789, 0.353, 0.409, 0.739] (False)
[1.000, 0.000, 0.259, 0.290, 0.421, 0.250, 0.429] (False)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.818, 0.636, 0.313, 0.750, 0.600] (False)
[1.000, 0.000, 0.750, 0.417, 0.783, 0.467, 0.563] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[0.667, 0.000, 0.533, 0.737, 0.353, 0.667, 0.478] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[1.000, 0.000, 0.818, 0.538, 0.545, 0.722, 0.313] (False)
[0.436, 0.000, 0.700, 0.533, 0.353, 0.444, 0.783] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.704, 0.375, 0.348, 0.750, 0.727] (False)
[1.000, 0.000, 0.737, 0.714, 0.167, 0.259, 0.250] (False)
[0.680, 0.000, 0.609, 0.737, 0.600, 0.529, 0.696] (False)
[1.000, 0.000, 0.750, 0.722, 0.895, 0.182, 0.316] (False)
[1.000, 0.000, 0.786, 0.619, 0.500, 1.000, 0.500] (False)
[1.000, 0.000, 0.478, 0.857, 0.833, 0.472, 0.762] (False)
[1.000, 0.000, 0.583, 0.452, 0.474, 0.294, 0.667] (False)
[1.000, 0.000, 0.370, 0.450, 0.750, 0.550, 0.529] (False)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[0.533, 0.000, 0.400, 0.684, 0.600, 0.500, 0.565] (False)
[1.000, 0.000, 0.300, 0.577, 0.545, 0.355, 0.263] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.316, 0.867, 0.417, 0.333, 0.271] (False)
[1.000, 0.000, 0.643, 0.538, 0.545, 0.226, 0.286] (False)
[1.000, 0.000, 0.368, 0.710, 0.826, 0.333, 0.429] (False)
[1.000, 0.000, 0.438, 0.677, 0.211, 0.357, 0.524] (False)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 0.000, 0.565, 0.737, 0.588, 0.727, 0.762] (False)
[0.722, 0.000, 0.875, 0.810, 0.571, 0.643, 0.478] (False)
[1.000, 0.000, 0.857, 0.591, 0.636, 0.783, 0.818] (False)
[1.000, 0.000, 0.500, 0.714, 0.235, 0.857, 0.571] (False)
[0.667, 0.000, 0.650, 0.667, 0.353, 0.389, 0.421] (False)
[1.000, 0.000, 0.444, 0.643, 0.889, 0.750, 0.643] (False)
[0.833, 0.000, 0.632, 0.867, 0.500, 0.130, 0.292] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[1.000, 0.000, 0.875, 0.484, 0.474, 0.417, 0.524] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[0.667, 0.000, 0.650, 0.600, 0.706, 0.727, 0.850] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[0.667, 0.000, 0.850, 0.500, 0.708, 0.333, 0.396] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.783, 0.933, 0.417, 0.315, 0.438] (False)
[1.000, 0.000, 0.542, 0.526, 0.850, 0.318, 0.800] (False)
Perform oracle with 100.00 accuracy on 72 weight vectors
The oracle will correctly classify 72 weight vectors and wrongly classify 0
Classified 0 matches and 72 non-matches
Purity of oracle classification: 1.000
Entropy of oracle classification: 0.000
Number of true matches: 0
Number of false matches: 0
Number of true non-matches: 72
Number of false non-matches: 0
*** Warning: Oracle returns an empty match dictionary ***
Deleted 72 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
63.0
Analisando o arquivo: diverg(15)735_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (15, 1 - acm diverg, 735), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)735_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 528
Number of entity ID pairs that occurred more than once: 0
Analyse set of 528 weight vectors
Containing 224 true matches and 304 true non-matches
(42.42% true matches)
Identified 489 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 470 (96.11%)
2 : 16 (3.27%)
3 : 2 (0.41%)
20 : 1 (0.20%)
Identified 1 non-pure unique weight vectors (from 489 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 187
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 301
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 527
Number of unique weight vectors: 489
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(489, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 489 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 80
Perform initial selection using "far" method
Farthest first selection of 80 weight vectors from 489 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.867, 1.000, 0.522, 0.680, 0.400] (False)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[1.000, 1.000, 0.476, 0.429, 0.441, 0.367, 0.237] (True)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 0.714, 0.563, 0.278, 0.385] (False)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.300, 0.467, 0.500, 0.818, 0.421] (False)
[1.000, 0.000, 0.391, 0.786, 0.588, 0.706, 0.238] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 0.000, 0.900, 0.636, 0.600, 0.818, 0.444] (False)
[1.000, 0.000, 0.429, 0.417, 0.647, 0.583, 1.000] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.186, 1.000, 1.000, 1.000, 0.333, 0.111, 0.238] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.900, 0.800, 0.267, 0.353, 0.647] (False)
[0.964, 1.000, 0.140, 0.100, 0.188, 0.000, 0.000] (False)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.815, 0.391, 0.571, 0.650, 0.818] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.517, 1.000, 1.000, 0.500, 0.172, 0.036, 0.095] (False)
[1.000, 0.000, 0.615, 0.333, 0.455, 0.333, 0.286] (False)
[1.000, 1.000, 0.500, 0.286, 0.750, 0.613, 0.200] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.727, 0.571, 0.750, 0.167, 0.813] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[1.000, 0.000, 0.200, 0.800, 0.750, 0.611, 0.684] (False)
[1.000, 0.000, 0.500, 0.615, 0.294, 0.211, 0.545] (False)
[0.889, 1.000, 0.232, 0.205, 0.211, 0.205, 0.833] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.592, 1.000, 0.229, 0.261, 0.200, 0.857, 0.972] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[1.000, 0.000, 0.889, 0.667, 0.941, 0.500, 0.333] (False)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 0.000, 0.583, 0.615, 0.778, 0.526, 0.611] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.714] (True)
[0.907, 1.000, 0.619, 0.118, 0.091, 0.063, 0.188] (True)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 0.000, 0.304, 0.714, 0.625, 0.294, 0.238] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 80 weight vectors
The oracle will correctly classify 80 weight vectors and wrongly classify 0
Classified 36 matches and 44 non-matches
Purity of oracle classification: 0.550
Entropy of oracle classification: 0.993
Number of true matches: 36
Number of false matches: 0
Number of true non-matches: 44
Number of false non-matches: 0
Deleted 80 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 409 weight vectors
Based on 36 matches and 44 non-matches
Classified 208 matches and 201 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 80
Size, purity, entropy, and estimated match proportion of clusters in queue:
(208, 0.55, 0.9927744539878084, 0.45)
(201, 0.55, 0.9927744539878084, 0.45)
Current size of match and non-match training data sets: 36 / 44
Selected cluster with (queue ordering: random):
- Purity 0.55 and entropy 0.99
- Size 201 weight vectors
- Estimated match proportion 0.450
Sample size for this cluster: 65
Farthest first selection of 65 weight vectors from 201 vectors
The selected farthest weight vectors are:
[0.600, 0.944, 0.226, 0.174, 0.000, 0.174, 0.059] (False)
[1.000, 0.000, 0.636, 0.786, 0.750, 0.139, 0.313] (False)
[1.000, 0.000, 0.636, 0.571, 0.667, 0.278, 0.778] (False)
[1.000, 0.000, 0.391, 0.500, 0.625, 0.353, 0.667] (False)
[1.000, 0.000, 0.818, 0.833, 0.412, 0.625, 0.833] (False)
[1.000, 0.000, 0.778, 0.667, 0.833, 0.833, 0.278] (False)
[1.000, 0.000, 0.478, 0.786, 0.500, 0.471, 0.429] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.000, 0.348, 0.867, 0.529, 0.706, 0.524] (False)
[1.000, 0.000, 0.333, 0.667, 0.750, 0.909, 0.842] (False)
[1.000, 0.000, 0.391, 0.500, 0.600, 0.529, 0.381] (False)
[0.603, 1.000, 0.269, 0.169, 0.231, 0.258, 0.116] (False)
[1.000, 0.000, 0.875, 0.375, 0.625, 0.259, 0.214] (False)
[0.450, 1.000, 0.176, 0.121, 0.242, 0.158, 0.217] (False)
[1.000, 0.000, 0.900, 0.833, 0.588, 0.750, 0.278] (False)
[1.000, 0.000, 0.700, 0.727, 0.556, 0.818, 0.778] (False)
[0.344, 1.000, 0.180, 0.255, 0.171, 0.189, 0.000] (False)
[1.000, 0.000, 0.636, 0.727, 0.389, 0.625, 0.333] (False)
[1.000, 0.000, 0.438, 0.571, 0.444, 0.533, 0.611] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 1.000, 0.714, 0.600, 0.647, 0.529] (False)
[1.000, 0.000, 0.500, 0.444, 0.294, 0.182, 0.316] (False)
[0.833, 1.000, 0.077, 0.067, 0.133, 0.214, 0.000] (True)
[1.000, 0.000, 0.818, 0.500, 0.500, 0.250, 0.500] (False)
[1.000, 0.000, 0.233, 0.533, 0.611, 0.909, 0.737] (False)
[1.000, 0.000, 0.818, 0.909, 0.625, 0.500, 0.667] (False)
[1.000, 0.000, 0.636, 0.818, 0.438, 0.313, 0.833] (False)
[1.000, 0.000, 0.700, 0.818, 0.563, 0.455, 0.278] (False)
[1.000, 0.000, 0.429, 0.286, 0.500, 0.500, 0.778] (False)
[0.680, 1.000, 0.150, 0.250, 0.333, 0.192, 0.000] (False)
[1.000, 0.000, 0.818, 0.786, 0.750, 0.306, 0.889] (False)
[1.000, 1.000, 0.154, 0.211, 0.162, 0.000, 0.000] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.750, 1.000, 0.256, 0.080, 0.286, 0.059, 0.229] (False)
[1.000, 0.000, 0.261, 0.857, 0.800, 0.778, 0.619] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[1.000, 0.000, 0.435, 0.786, 0.800, 0.588, 0.810] (False)
[1.000, 0.000, 0.778, 0.875, 0.833, 0.600, 0.722] (False)
[1.000, 0.000, 0.636, 0.727, 0.278, 0.800, 0.500] (False)
[0.857, 0.000, 0.688, 0.500, 0.412, 0.409, 0.684] (False)
[1.000, 0.000, 0.300, 0.467, 0.750, 0.545, 0.684] (False)
[1.000, 0.000, 0.233, 0.667, 0.688, 0.455, 0.263] (False)
[0.917, 0.000, 0.818, 0.636, 0.833, 0.889, 0.889] (False)
[1.000, 0.000, 0.818, 0.917, 0.294, 0.667, 0.556] (False)
[0.800, 0.000, 0.375, 0.571, 0.333, 0.267, 0.333] (False)
[1.000, 0.000, 0.304, 0.571, 0.556, 0.588, 0.762] (False)
[1.000, 0.000, 0.300, 0.733, 0.706, 0.833, 0.263] (False)
[1.000, 0.000, 1.000, 0.583, 0.353, 0.750, 0.625] (False)
[1.000, 0.000, 0.667, 0.500, 0.600, 0.500, 0.615] (False)
[1.000, 0.000, 0.833, 0.667, 0.765, 0.773, 0.579] (False)
[1.000, 0.000, 0.818, 0.727, 0.438, 0.375, 0.400] (False)
[1.000, 0.000, 0.800, 0.636, 0.563, 0.545, 0.722] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[1.000, 0.944, 0.231, 0.111, 0.143, 0.214, 0.333] (False)
[1.000, 0.000, 0.818, 0.786, 0.706, 0.333, 0.313] (False)
[1.000, 0.000, 0.760, 0.917, 0.500, 0.786, 0.500] (False)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 0.000, 0.367, 0.800, 0.833, 0.306, 0.789] (False)
[1.000, 0.000, 1.000, 1.000, 0.950, 1.000, 1.000] (False)
[1.000, 0.000, 0.600, 0.714, 1.000, 0.611, 0.722] (False)
[1.000, 0.000, 0.538, 0.778, 0.636, 0.632, 0.563] (False)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 0.000, 0.909, 0.786, 0.417, 0.222, 0.563] (False)
[0.633, 1.000, 0.414, 0.109, 0.176, 0.153, 0.000] (False)
[0.800, 0.944, 0.308, 0.125, 0.040, 0.000, 0.071] (False)
Perform oracle with 100.00 accuracy on 65 weight vectors
The oracle will correctly classify 65 weight vectors and wrongly classify 0
Classified 1 matches and 64 non-matches
Purity of oracle classification: 0.985
Entropy of oracle classification: 0.115
Number of true matches: 1
Number of false matches: 0
Number of true non-matches: 64
Number of false non-matches: 0
Deleted 65 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(20)739_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (20, 1 - acm diverg, 739), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(20)739_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 1092
Number of entity ID pairs that occurred more than once: 0
Analyse set of 1092 weight vectors
Containing 226 true matches and 866 true non-matches
(20.70% true matches)
Identified 1035 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 998 (96.43%)
2 : 34 (3.29%)
3 : 2 (0.19%)
20 : 1 (0.10%)
Identified 1 non-pure unique weight vectors (from 1035 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 189
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 845
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 1091
Number of unique weight vectors: 1035
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(1035, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 1035 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 88
Perform initial selection using "far" method
Farthest first selection of 88 weight vectors from 1035 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.667, 0.000, 0.760, 0.909, 0.818, 0.500, 0.727] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[1.000, 0.000, 0.407, 0.818, 0.875, 0.800, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.800, 0.000, 0.250, 0.429, 0.467, 0.533, 0.444] (False)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.400, 0.000, 0.750, 0.737, 0.500, 0.800, 0.633] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 0.000, 0.632, 0.714, 0.250, 0.130, 0.167] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[0.667, 0.000, 0.440, 0.909, 0.227, 0.571, 0.409] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.733, 0.000, 0.176, 0.261, 0.216, 0.261, 0.125] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.769, 0.522, 0.786, 0.929, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.476, 0.455, 0.833, 0.636, 0.278] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[1.000, 0.000, 1.000, 0.727, 0.889, 0.500, 0.278] (False)
[0.667, 0.000, 0.792, 0.333, 0.700, 0.389, 0.737] (False)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[1.000, 0.000, 0.368, 0.484, 0.708, 0.093, 0.417] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
Perform oracle with 100.00 accuracy on 88 weight vectors
The oracle will correctly classify 88 weight vectors and wrongly classify 0
Classified 27 matches and 61 non-matches
Purity of oracle classification: 0.693
Entropy of oracle classification: 0.889
Number of true matches: 27
Number of false matches: 0
Number of true non-matches: 61
Number of false non-matches: 0
Deleted 88 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 947 weight vectors
Based on 27 matches and 61 non-matches
Classified 148 matches and 799 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 88
Size, purity, entropy, and estimated match proportion of clusters in queue:
(148, 0.6931818181818182, 0.8894663896628687, 0.3068181818181818)
(799, 0.6931818181818182, 0.8894663896628687, 0.3068181818181818)
Current size of match and non-match training data sets: 27 / 61
Selected cluster with (queue ordering: random):
- Purity 0.69 and entropy 0.89
- Size 148 weight vectors
- Estimated match proportion 0.307
Sample size for this cluster: 53
Farthest first selection of 53 weight vectors from 148 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.903, 0.903, 0.903, 0.903, 0.903] (False)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[1.000, 1.000, 1.000, 1.000, 0.882, 1.000, 1.000] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[0.644, 1.000, 1.000, 1.000, 0.933, 1.000, 0.900] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.500, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.886, 1.000, 0.750, 0.900, 0.889, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[0.783, 1.000, 1.000, 1.000, 1.000, 1.000, 0.850] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[1.000, 1.000, 1.000, 0.938, 1.000, 0.900, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
Perform oracle with 100.00 accuracy on 53 weight vectors
The oracle will correctly classify 53 weight vectors and wrongly classify 0
Classified 50 matches and 3 non-matches
Purity of oracle classification: 0.943
Entropy of oracle classification: 0.314
Number of true matches: 50
Number of false matches: 0
Number of true non-matches: 3
Number of false non-matches: 0
Deleted 53 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
39.0
Analisando o arquivo: diverg(15)408_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.133779
f-measure 0.235988
da 40
dm 0
ndm 0
tp 40
fp 0
tn 4.76529e+07
fn 259
Name: (15, 1 - acm diverg, 408), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(15)408_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 701
Number of entity ID pairs that occurred more than once: 0
Analyse set of 701 weight vectors
Containing 219 true matches and 482 true non-matches
(31.24% true matches)
Identified 646 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 610 (94.43%)
2 : 33 (5.11%)
3 : 2 (0.31%)
19 : 1 (0.15%)
Identified 1 non-pure unique weight vectors (from 646 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.947 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 461
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 700
Number of unique weight vectors: 646
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(646, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 646 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 83
Perform initial selection using "far" method
Farthest first selection of 83 weight vectors from 646 vectors
The selected farthest weight vectors are:
[1.000, 0.000, 0.833, 0.500, 0.550, 1.000, 0.313] (False)
[0.667, 0.000, 0.800, 0.786, 0.455, 0.706, 0.909] (False)
[1.000, 0.000, 0.033, 0.650, 0.833, 0.727, 0.762] (False)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.280, 0.818, 0.727, 0.357, 0.227] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.048, 0.250, 0.917, 0.875, 0.238] (False)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.050, 0.250, 0.818, 0.000, 0.000] (False)
[0.667, 0.000, 0.040, 0.550, 0.500, 0.571, 0.909] (False)
[0.337, 1.000, 1.000, 1.000, 1.000, 0.129, 0.174] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[0.795, 1.000, 0.087, 0.154, 0.269, 0.147, 0.156] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[0.667, 0.000, 0.400, 1.000, 0.455, 0.786, 0.727] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 0.000, 0.250, 0.714, 0.500, 0.389, 0.813] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.667, 0.000, 0.720, 0.455, 0.500, 0.000, 0.000] (False)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.467, 0.700, 0.611, 0.000, 0.444] (False)
[0.350, 1.000, 0.643, 0.440, 0.154, 0.739, 0.474] (False)
[1.000, 0.000, 0.067, 0.550, 0.455, 1.000, 0.429] (False)
[1.000, 0.000, 0.556, 0.318, 0.333, 0.200, 0.667] (False)
[0.667, 0.000, 0.760, 0.455, 0.545, 0.500, 0.227] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 0.000, 0.367, 0.545, 0.333, 0.688, 0.286] (False)
[0.983, 0.556, 1.000, 0.846, 0.957, 0.833, 0.944] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[0.808, 1.000, 0.667, 0.317, 0.516, 0.571, 0.594] (True)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.240, 1.000, 1.000, 0.900, 1.000, 0.947, 0.146] (True)
[1.000, 0.000, 0.750, 0.133, 0.235, 0.571, 0.429] (False)
[0.725, 1.000, 1.000, 0.960, 1.000, 0.373, 1.000] (True)
[1.000, 0.000, 0.778, 0.833, 0.882, 0.417, 1.000] (False)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.350, 0.500, 0.536, 0.350, 0.333] (True)
[1.000, 0.000, 0.500, 0.714, 0.000, 0.000, 0.600] (False)
[1.000, 0.000, 0.308, 0.261, 0.684, 0.692, 0.846] (False)
[1.000, 0.000, 0.600, 0.700, 0.545, 0.000, 1.000] (False)
[1.000, 0.000, 1.000, 0.389, 0.778, 0.706, 0.750] (False)
[0.319, 1.000, 0.667, 0.310, 0.219, 0.172, 0.226] (True)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[1.000, 0.000, 0.850, 0.417, 1.000, 0.000, 0.000] (False)
[1.000, 1.000, 0.556, 0.529, 1.000, 0.548, 0.316] (True)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[0.625, 0.000, 0.375, 0.500, 0.267, 0.067, 0.500] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.120, 0.118, 0.160, 1.000] (True)
[1.000, 0.000, 0.000, 0.700, 0.917, 0.222, 0.762] (False)
[0.956, 1.000, 1.000, 0.071, 0.143, 0.111, 0.000] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.750, 0.125, 0.667, 0.185, 0.179] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 0.067] (True)
[1.000, 1.000, 0.846, 0.652, 0.720, 0.500, 0.789] (True)
[1.000, 0.000, 0.286, 0.357, 0.833, 0.389, 0.385] (False)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 0.810, 1.000, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.305, 1.000, 0.900, 0.889, 0.190, 0.226, 0.190] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[1.000, 0.000, 0.917, 0.786, 0.632, 0.111, 0.375] (False)
[0.667, 0.000, 0.360, 0.833, 0.773, 0.786, 0.227] (False)
[1.000, 0.000, 0.176, 0.304, 0.351, 0.217, 0.229] (False)
[1.000, 0.000, 0.917, 0.786, 0.368, 0.250, 0.833] (False)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.000, 0.583, 0.722, 0.889, 0.882, 0.786] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
Perform oracle with 100.00 accuracy on 83 weight vectors
The oracle will correctly classify 83 weight vectors and wrongly classify 0
Classified 28 matches and 55 non-matches
Purity of oracle classification: 0.663
Entropy of oracle classification: 0.922
Number of true matches: 28
Number of false matches: 0
Number of true non-matches: 55
Number of false non-matches: 0
Deleted 83 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
SVM classification of 563 weight vectors
Based on 28 matches and 55 non-matches
Classified 157 matches and 406 non-matches
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 2: Queue length: 2
Number of manual oracle classifications performed: 83
Size, purity, entropy, and estimated match proportion of clusters in queue:
(157, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
(406, 0.6626506024096386, 0.922259647473802, 0.3373493975903614)
Current size of match and non-match training data sets: 28 / 55
Selected cluster with (queue ordering: random):
- Purity 0.66 and entropy 0.92
- Size 157 weight vectors
- Estimated match proportion 0.337
Sample size for this cluster: 56
Farthest first selection of 56 weight vectors from 157 vectors
The selected farthest weight vectors are:
[1.000, 1.000, 0.889, 0.800, 0.500, 0.346, 0.833] (True)
[0.433, 1.000, 1.000, 1.000, 0.824, 1.000, 1.000] (True)
[0.911, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 1.000, 0.769, 0.636, 0.667, 0.667, 0.615] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.333, 0.356] (True)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[0.876, 0.756, 0.935, 1.000, 0.875, 0.882, 0.267] (True)
[0.496, 1.000, 0.800, 0.857, 0.833, 0.762, 0.800] (True)
[1.000, 1.000, 0.846, 0.209, 0.600, 0.500, 0.711] (True)
[1.000, 1.000, 1.000, 1.000, 0.650, 1.000, 0.950] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.380, 1.000, 1.000, 0.933, 1.000, 1.000, 0.714] (True)
[1.000, 1.000, 1.000, 1.000, 0.875, 0.769, 1.000] (True)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[0.644, 1.000, 0.870, 0.875, 0.800, 0.889, 0.800] (True)
[0.889, 1.000, 0.938, 0.905, 0.696, 0.897, 0.941] (True)
[0.550, 1.000, 0.833, 0.842, 0.923, 1.000, 0.882] (True)
[0.867, 1.000, 0.700, 0.667, 0.824, 0.667, 0.667] (True)
[1.000, 1.000, 0.885, 0.727, 0.778, 0.750, 0.692] (True)
[1.000, 1.000, 1.000, 0.842, 0.786, 1.000, 1.000] (True)
[0.807, 1.000, 0.800, 0.842, 1.000, 1.000, 0.875] (True)
[1.000, 1.000, 1.000, 0.372, 0.833, 1.000, 0.839] (True)
[0.855, 1.000, 1.000, 0.885, 1.000, 0.773, 1.000] (True)
[1.000, 1.000, 0.500, 0.257, 0.750, 0.567, 0.550] (True)
[0.495, 1.000, 1.000, 1.000, 1.000, 0.000, 0.071] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 0.148, 0.333] (True)
[1.000, 1.000, 1.000, 0.750, 1.000, 1.000, 1.000] (True)
[0.280, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.630, 0.697, 0.607, 0.615, 0.192] (True)
[1.000, 0.667, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.789, 0.875, 0.800, 0.867, 0.625] (True)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[1.000, 1.000, 1.000, 0.979, 0.974, 0.357, 0.535] (True)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[0.967, 1.000, 0.800, 0.929, 0.917, 0.714, 0.867] (True)
[1.000, 0.778, 0.643, 0.667, 0.792, 0.833, 0.706] (True)
[0.845, 1.000, 0.849, 0.851, 0.683, 0.418, 0.519] (False)
[1.000, 1.000, 0.933, 0.867, 0.867, 0.600, 0.600] (False)
[0.567, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 1.000, 0.800, 0.909, 1.000, 0.933, 1.000] (True)
[0.870, 1.000, 0.875, 0.857, 0.600, 0.645, 0.882] (True)
[1.000, 1.000, 0.778, 0.906, 0.769, 0.885, 0.864] (True)
[0.833, 1.000, 0.917, 1.000, 1.000, 1.000, 1.000] (True)
[0.876, 1.000, 0.733, 0.900, 0.900, 0.095, 0.250] (True)
[0.750, 1.000, 0.778, 1.000, 1.000, 1.000, 0.176] (True)
[0.812, 1.000, 0.846, 0.955, 0.909, 0.800, 1.000] (True)
[0.867, 1.000, 1.000, 0.867, 0.875, 1.000, 0.824] (True)
[1.000, 1.000, 0.895, 0.783, 0.903, 0.800, 0.867] (True)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.875] (True)
[0.981, 1.000, 0.938, 0.952, 0.720, 0.677, 0.941] (True)
[1.000, 1.000, 0.818, 0.800, 0.714, 0.750, 1.000] (True)
[0.917, 1.000, 0.800, 0.833, 0.769, 0.750, 0.778] (True)
[0.975, 1.000, 0.750, 0.889, 0.333, 0.833, 0.813] (True)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
Perform oracle with 100.00 accuracy on 56 weight vectors
The oracle will correctly classify 56 weight vectors and wrongly classify 0
Classified 51 matches and 5 non-matches
Purity of oracle classification: 0.911
Entropy of oracle classification: 0.434
Number of true matches: 51
Number of false matches: 0
Number of true non-matches: 5
Number of false non-matches: 0
Deleted 56 weight vectors (classified by oracle) from cluster
Cluster not pure enough or too large, and can be split further
Reached end of manual classification budget
40.0
Analisando o arquivo: diverg(10)278_NEW.csv
<class 'pandas.core.series.Series'>
Linha atual aqui, jovem!
(13,)
abordagem DS
iteracao 0
inspecoesManuais 0
precision 1
recall 0.130435
f-measure 0.230769
da 39
dm 0
ndm 0
tp 39
fp 0
tn 4.76529e+07
fn 260
Name: (10, 1 - acm diverg, 278), dtype: object
Load weight vector file: ../csv/conjuntosDS/conjuntosDivergAA/diverg(10)278_NEW.csv
Weights to use: ['title', 'artist', 'track01', 'track02', 'track03', 'track10', 'track11']
Number of weight vectors: 730
Number of entity ID pairs that occurred more than once: 0
Analyse set of 730 weight vectors
Containing 220 true matches and 510 true non-matches
(30.14% true matches)
Identified 694 unique weight vectors
Frequency distribution of occurences of weight vectors:
Occurence : Number of weight vectors that occur that often
1 : 678 (97.69%)
2 : 13 (1.87%)
3 : 2 (0.29%)
20 : 1 (0.14%)
Identified 1 non-pure unique weight vectors (from 694 unique weight vectors)
Pureness (as percentage of matches) for a certain unique weight vector:
Pureness : Count
1.000 : 184
0.950 : 1 (minority class weight vectors with this pureness to be removed)
0.000 : 509
Removed 1 non-pure weight vectors
Final number of weight vectors to use: 729
Number of unique weight vectors: 694
Time to load and analyse the weight vector file: 0.01 sec
Initial estimated match proportion: 0.500
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Loop 1: Queue length: 1
Number of manual oracle classifications performed: 0
Size, purity, entropy, and estimated match proportion of clusters in queue:
(694, 0.5, 1.0, 0.5)
Current size of match and non-match training data sets: 0 / 0
Selected cluster with (queue ordering: random):
- Purity 0.50 and entropy 1.00
- Size 694 weight vectors
- Estimated match proportion 0.500
Sample size for this cluster: 84
Perform initial selection using "far" method
Farthest first selection of 84 weight vectors from 694 vectors
The selected farthest weight vectors are:
[0.500, 0.000, 0.417, 0.500, 0.300, 0.636, 0.765] (False)
[1.000, 0.000, 0.474, 0.577, 0.708, 0.519, 0.104] (False)
[0.617, 1.000, 0.778, 0.867, 0.875, 1.000, 0.176] (True)
[1.000, 1.000, 0.000, 0.042, 0.050, 0.607, 0.339] (False)
[1.000, 0.000, 0.400, 0.500, 0.818, 0.111, 0.615] (False)
[1.000, 1.000, 0.625, 0.900, 0.857, 0.929, 0.909] (True)
[1.000, 1.000, 0.333, 1.000, 0.210, 0.100, 0.214] (True)
[1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000] (True)
[0.966, 1.000, 0.727, 0.125, 0.727, 0.200, 0.217] (True)
[0.667, 0.000, 0.350, 0.677, 0.737, 0.278, 0.810] (False)
[0.875, 1.000, 0.125, 0.188, 0.257, 0.171, 1.000] (False)
[1.000, 0.444, 1.000, 1.000, 0.859, 0.156, 0.207] (False)
[1.000, 0.000, 0.650, 0.357, 0.833, 0.000, 0.000] (False)
[1.000, 0.000, 0.421, 0.800, 0.375, 0.222, 0.229] (False)
[0.650, 1.000, 1.000, 0.350, 0.350, 0.300, 0.300] (False)
[0.958, 0.000, 0.750, 0.800, 0.750, 0.000, 0.000] (False)
[1.000, 1.000, 0.250, 0.167, 0.135, 0.143, 0.000] (False)
[1.000, 0.000, 0.583, 0.444, 0.944, 0.455, 0.789] (False)
[0.687, 1.000, 0.467, 0.500, 0.371, 0.552, 0.517] (False)
[0.680, 0.778, 0.125, 0.571, 0.500, 0.700, 0.667] (False)
[1.000, 1.000, 1.000, 1.000, 0.929, 1.000, 0.600] (True)
[0.850, 1.000, 0.733, 1.000, 0.588, 0.615, 0.632] (True)
[0.867, 1.000, 1.000, 1.000, 0.929, 0.487, 0.474] (True)
[0.267, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[0.666, 1.000, 0.150, 0.200, 0.132, 0.194, 0.438] (False)
[1.000, 1.000, 0.765, 0.692, 0.429, 0.778, 0.500] (True)
[1.000, 0.000, 0.789, 0.833, 0.174, 0.867, 0.714] (False)
[1.000, 0.556, 1.000, 0.929, 1.000, 0.220, 1.000] (False)
[1.000, 1.000, 0.205, 0.708, 0.757, 0.800, 0.806] (False)
[0.667, 0.000, 0.600, 0.857, 0.500, 0.700, 0.267] (False)
[0.533, 0.000, 0.556, 0.474, 0.750, 0.450, 0.391] (False)
[0.750, 0.000, 0.792, 0.667, 0.792, 0.130, 0.333] (False)
[0.833, 0.000, 0.542, 0.714, 0.600, 1.000, 0.813] (False)
[0.900, 1.000, 0.536, 0.195, 0.051, 0.206, 0.440] (False)
[0.500, 0.000, 0.684, 0.737, 0.250, 0.167, 0.417] (False)
[1.000, 0.000, 0.000, 0.857, 0.364, 0.571, 0.476] (False)
[0.600, 0.944, 0.382, 0.023, 0.303, 0.397, 0.147] (False)
[0.424, 1.000, 0.105, 0.077, 0.067, 0.833, 1.000] (False)
[0.917, 1.000, 1.000, 1.000, 1.000, 0.053, 1.000] (True)
[1.000, 1.000, 0.214, 1.000, 1.000, 1.000, 0.167] (False)
[1.000, 1.000, 0.698, 0.450, 0.703, 0.508, 0.557] (False)
[1.000, 1.000, 1.000, 0.524, 1.000, 1.000, 1.000] (True)
[0.707, 1.000, 1.000, 1.000, 1.000, 1.000, 0.941] (True)
[1.000, 0.000, 0.563, 0.731, 0.182, 0.452, 0.500] (False)
[1.000, 1.000, 0.810, 0.760, 0.800, 0.417, 0.833] (True)
[0.350, 1.000, 0.194, 0.193, 0.105, 0.176, 1.000] (False)
[1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 1.000] (True)
[1.000, 0.000, 0.542, 0.476, 0.300, 1.000, 0.500] (False)
[1.000, 0.000, 0.786, 0.857, 1.000, 0.194, 0.813] (False)
[1.000, 0.000, 0.545, 0.714, 0.700, 0.833, 0.462] (False)
[0.644, 1.000, 0.885, 0.727, 0.833, 0.750, 0.769] (True)
[1.000, 0.000, 0.500, 0.929, 0.900, 0.889, 0.929] (False)
[0.667, 0.000, 0.750, 0.667, 0.235, 0.722, 0.526] (False)
[0.220, 1.000, 0.500, 0.500, 0.500, 1.000, 1.000] (False)
[1.000, 1.000, 1.000, 0.138, 0.167, 0.143, 0.048] (False)
[0.733, 0.000, 0.176, 0.304, 0.135, 0.174, 0.125] (False)
[1.000, 0.556, 0.125, 0.182, 0.071, 0.167, 0.115] (False)
[1.000, 1.000, 1.000, 0.933, 0.167, 0.167, 1.000] (True)
[1.000, 1.000, 0.267, 0.642, 0.486, 0.474, 0.974] (True)
[1.000, 0.556, 0.941, 0.957, 0.958, 0.938, 0.979] (False)
[0.900, 1.000, 1.000, 0.909, 0.600, 0.875, 0.958] (True)
[1.000, 1.000, 0.375, 0.933, 0.313, 1.000, 1.000] (True)
[1.000, 0.000, 0.417, 0.357, 0.350, 0.412, 0.625] (False)
[0.533, 0.000, 0.667, 0.800, 0.857, 0.727, 0.652] (False)
[0.226, 1.000, 0.667, 0.667, 0.667, 0.667, 0.667] (False)
[0.338, 1.000, 0.591, 0.765, 0.818, 1.000, 1.000] (True)
[0.667, 0.000, 0.704, 0.300, 0.471, 0.750, 0.474] (False)
[1.000, 1.000, 0.450, 0.750, 0.200, 0.389, 0.684] (False)
[1.000, 1.000, 0.667, 0.250, 0.000, 0.857, 0.889] (True)
[0.767, 1.000, 0.636, 0.769, 0.176, 0.750, 0.929] (True)
[0.329, 1.000, 0.143, 0.048, 0.143, 0.162, 0.000] (False)
[1.000, 0.000, 0.857, 1.000, 0.278, 0.400, 0.333] (False)
[1.000, 1.000, 0.186, 0.575, 0.788, 0.407, 0.548] (False)
[0.520, 1.000, 0.923, 0.000, 0.083, 0.947, 1.000] (False)
[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000] (True)
[1.000, 0.000, 0.846, 0.583, 0.579, 0.364, 0.231] (False)
[1.000, 1.000, 0.439, 0.295, 0.156, 0.613, 0.583] (True)
[1.000, 0.000, 0.769, 0.905, 1.000, 0.636, 0.412] (False)
[1.000, 0.000, 0.292, 0.323, 0.800, 0.714, 0.714] (False)
[0.667,